Most online logging tools feature convoluted UIs, arbitrary mandatory fields, questionable AI/insights, complex pricing, etc. I hope my application fixes most of these issues. It also has some nice features, such as automatic Geo IP checks and public dashboards.
Although I've created lots of software, this is my first open source application (MIT license), the tutorial for selfhosting is hopefully sufficient! Most of my development career has been with C#, NodeJS and PHP. For this project I've used PHP (8.3) which is an absolute joy to work with. The architecture is very scalable, but I've only tested up to a few billion logs. The current version is used in production for a few months now. Hope you enjoy/fork it as you see fit!
My suggestion for the self-hosting is to create docker images and use docker-compose. The self-hosting currently is a bit of effort to setup.
I also wonder if PHP is a good language for this. For the UI, yea that's fine and makes sense. But for the log processor that's going to need to handle a high throughput which PHP just isn't good at. For the same resources, you can have Go doing thousands of requests per second vs PHP doing hundreds of requests per second.
I'm not a big fan when folks call out languages as bottlenecks when they have no proof on the actual overhead and how much faster it would be in another language.
Most PHP deployments barely reach a hundred per server.
And this is an open source project is should be designed to handle basic production workloads which it could but it'll cost you a bunch more than if you used the correct languages.
> I'm not a big fan when folks call out languages as bottlenecks when they have no proof on the actual overhead and how much faster it would be in another language.
Honestly, I thought it was so obvious that an interpreted language is not good for high throughput endpoints that it didn't need to be proven. I also thought it was obvious that a logging system is going to handle lots and lots of data.
It could be easily proven by doing a bunch of work but obviously there is no point in me proving it.
> It could be easily proven by doing a bunch of work but obviously there is no point in me proving it.
Because you cannot prove it... :) I wrote this post a few years ago, that actually spurned some improvements in C# ... so here you go: https://withinboredom.info/2022/03/16/yes-php-is-faster-than...
I notice your benchmarks are over 10 runs?! That's not a good sample size. And even more importantly, it's not in the same context.
Sure once you compile PHP and have it running it'll run fast. But PHP has a very specific usage which is web applications. It's been well-known for years that PHP's performance issues are related to the fact it's an interpreted language that has to be interpreted everytime but if you compile and run repeatedly it can perform extremely well. Which is why every performance related PHP nerd is working on experimental tools to do that.
Like I said in the blog post, if I tell you the sky is blue and you don't believe me; run them yourself. FWIW, C# is faster now for that particular use case. Also, like I mentioned in a previous blog post ... which one would you rather maintain:
- https://github.com/TheAlgorithms/C-Sharp/blob/master/Algorit... -- merge sort in C# 130 lines
- https://www.w3resource.com/php-exercises/searching-and-sorti... -- merge sort in PHP 60 lines
PHP is often far more concise than C#, and many other languages. I code more in Go than C# or PHP these days, but even Go has its limitations where it would be easier to express in PHP than Go. There are even certain classes of algorithms that are butt-ugly in Go but quite pretty in PHP.
PHP is still my favorite language, even though I hardly get to use it these days.
> PHP has a very specific usage which is web applications.
Originally, yes. But it outgrew that about 10 years or so ago. It's much more general purpose now.[1][2]
[1]: https://nativephp.com/ -- desktop applications in php
[2]: https://static-php.dev/ -- build self-contained, statically compiled clis written in php
We're running out of disk space earlier than that PHP is a bottleneck here.
There's just no way that you're at all familiar with PHP of the last 10 years to think this is true.
> It could be easily proven by doing a bunch of work but obviously there is no point in me proving it.
Prove it. Please, show me the context and environment you think PHP would struggle to serve "hundreds of requests per second". I'd venture a bet that a plain Laravel installation on the cheapest digital ocean droplet would top this and Laravel is "slow" in relation to vanilla PHP.
The author writes that Clickhouse takes 0.1s for an example request: https://news.ycombinator.com/item?id=42666703
PHP would need to be adding 0.1s CPU time for processing the request for the PHP code to become the bottleneck. That seems unlikely.
> But for the log processor that's going to need to handle a high throughput which PHP just isn't good at.
I'm sorry, but wut? PHP is probably one of the fastest languages out there if you can ignore frameworks. It's backed by some of the most tuned C code out there and should be just about as fast as C for most tasks. The only reason it is not is due to the function call overhead -- which is by-far the slowest aspect of PHP.
> you can have Go doing thousands of requests per second vs PHP doing hundreds of requests per second.
This is mostly due to nginx and friends ... There is frankenphp (a frontend for php running in caddy which is written in go) which can easily handle 80k+ requests per second.
PHP is one of the fastest-interpreted languages. But compiled are going to be faster than interpreted pretty much everytime. It loses benchmarks against every language. That's not to mention it's slowed down by the fact it have to rebuild everything per request.
As a PHP developer for 15+ years, I can tell you what PHP is good at and what PHP is not good at. High throughput API endpoints such as log ingestion are not a good fit for PHP.
Your argument that if it breaks it's fine. Yea, who wants a log system that will only log some of your logs? No one. It's not mission critical but it's pretty important to keep working if you want to keep your system working. And in fact, some places it is a legal requirement.
Every language loses benchmarks against every other language. That's not surprising. Since you didn't provide a specific benchmark, it's hard to say why it lost.
> High throughput API endpoints such as log ingestion are not a good fit for PHP.
I disagree; but ultimately, it depends on how you're doing it. You can beat or exceed compiled languages in some cases. PHP allows some low-level stuff directly implemented in C and also the high-level stuff you're used to in interpreted languages.
For reference, moving about 4k logs from memory to disk takes less than 0.1 second. This is a real log from one of the webservers:
Start new cron loop: 2024-12-18 08:11:16.397...stored 3818 rows in /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc (0.0652 seconds).
Storing this data in ClickHouse takes a bit more than 0.1 second:
Start new cron loop: 2024-12-18 08:11:17.124...parsing file /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc
* Inserting 3818 row(s) on database server 1...0.137 seconds (approx. 3021.15 KB).
* Removed /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc
As for Docker, I'm too much of a Docker noob but I appreciate the suggestion.
Go and friends may make for more efficient resource utilization, but it will be marginal in the grand scheme of things unless there are plans to do massively different things.
As it is this code is very simple. I haven't used PHP in 15 years and I was able to trace through this from front-end to back-end in less than 3 minutes.
To me it look like a really great level of complexity for the problem it solves.
Keep it up, OP.
You may want to update your understanding of PHP and Go's speed . Both of your estimates are off by a couple orders of magnitude on commodity hardware. There are also numerous ways to make PHP extremely fast today (e.g. swoole, ngx_php, or frankenphp) instead of the 1999 best practice of apache with mod_php.
Go is absolutely an excellent choice, but your opinion on PHP is quite dated. Here are benchmarks for numerous Go (green) and PHP (blue) web frameworks: https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...
What we see here is a classic case of benchmarks saying one thing when the reality of production code says something else.
Also, I used go as a generic example of compiled languages. But what we see is production-grade Go languages outperforming non-production-ready experimental PHP tooling.
And if we go to look at all of them https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...
We'll see that even the experimental PHP solution is 43 and being beat out by compiled languages.
> I know this because as an active PHP developer for over a decade I'm very much paying attention to that field of PHP.
<insert swaggyp meme here>
As an active PHP developer as well it sounds like you have no idea what you're talking about.
> While you can use these tools you will almost certainly run into problems.
Which tools are "generally not considered production-ready"? From what I'm seeing on the linked list of benchmarks...
- vanilla php - workerman - ubiquity - webman - swoole
I'd venture to bet all of these are battle tested and production ready - years ago now.
As someone who has built a handful of services that ingest data in high volume through long-running PHP processes... it's stupidly easy and bulletproof. Might not be as fast as go, but to say these libraries or tech isn't production-ready is rather naive.
In a different PHP project, we have a bunch of background jobs which process large amounts of data, and they routinely go OOM because PHP stores data in a very inefficient way compared to Go. In Go, it's trivial to load hundreds of thousands objects into memory to quickly process them, but PHP already starts falling apart before we hit 100k. So we have to have smaller batches (= make more API calls), and the processing itself is much slower as well. And you can't easily parallelize without lots of complex tricks or additional daemons (which you need to set up and maintain). It's just more effort, more waste of time and more RAM/CPU for no particular gain.
In contrast, Go can efficiently manage thousands of such blocked goroutines without issue. Sure, you can address this problem in PHP, but you need:
- understand PHP-FPM (or whatever you use) configs and their footguns
- understand NGINX configs and their footguns
- fiddle with PHP configs/optimizing your code to fit within PHP's maximum limits
- rent larger servers to have the same throughput
This is a footgun, regardless of if it's a block from file systems or remote requests or whatever.
My claim that it's a configuration problem is just a 'fix' and there are ultimately an unlimited list of ways this same thing can come up to bite you. Well, outside of aggressive timeouts - and even then, with enough volume of requests that's even not going to save you :D
I'm not going to argue that PHP is _better_ than Go. Just starting off with that.
But if your background jobs are going OOM when processing large amounts of data it's likely that there's better ways to do what you're trying to do. It is true that it's easy to be lazy with memory/resources with PHP due to the assumption that it'll be used in a throwaway fashion (serve request -> die -> serve request -> die) - but it's also perfectly capable of long-running/daemonized processes that aren't memory issues rather trivially.
It’s all cognitive overhead I don’t want to learn.
edit: the connection to ClickHouse uses the MySQL driver, this is actually a very nice CH feature, you can connect to CH using the regular mysql or postgresql client tools. The PHP MySQL PDO driver works seamlessly. One catch, using advanced features like CH query timeouts requires a CTE function, check the model/txtlogrowdb.php file if you're interested.
https://github.com/WillieBeek/txtlog/blob/master/txtlog/data...
At work, we use Datadog for logging, and I have previously used CloudWatch, Splunk, and Honeycomb. Among these, only Honeycomb makes implementing canonical log lines [1] easier. I want arbitrarily wide, structured logs [2] without paying exorbitant costs for cardinality.
Our Datadog costs are outrageous, and it seems like no one cares at this point. Pydantic Logfire is also doing some good work in Python-specific environments. I use both Python and Go, but Logfire wasn’t as ergonomic in Go.
[1]: https://stripe.com/blog/canonical-log-lines
[2]: https://www.honeycomb.io/blog/structured-events-basis-observ...
It also appears that your documentation is currently a very verbose version of an OpenAPI spec, so you may save your readers some trouble by actually publishing one, with the added advantage that they come with a "Try it" button in the OpenAPI renders
That would allow you to save the natural language parts for describing things that are not API-centric (such as the "but WWWWHHHHYYY mysql AND clickhouse" that you alluded to elsewhere but wasn't mentioned at all in /doc nor /selfhost)
I do love this, since it 100% squares with my mental model of PHP's approach to life: you're holding it wrong https://www.php.net/manual/en/function.date-parse-from-forma...
You are, actually, doing it wrong.
https://carbon.nesbot.com/docs/
I forgive you, being that you're clearly not familiar with modern PHP and it's incredibly mature and diverse library ecosystem and first class package manager.
> However, it seems it is just a search stupidity ...
You're searching a list of thirty (30) functions. I don't even know how you found that list of functions but, surely, you don't think that's an exhaustive place to search for a specific date format? Surely you're not being purposely obtuse. (As you likely found, if you just plop your search term in the search at the top of the PHP website you would have found the DateTime class and how to handle these various formats)
Anyway - for anyone who may happen across this odd chain of comments, dealing with dates in PHP is an actual breeze using Carbon\Carbon.
url#:~:text=blah
I finally started using it when it landed on Firefox release (although, in true Firefox fashion, they give no fucks about the UX forcing me to install an extension that is "create link to selection")
You absolutely should vendor your dependencies and review them before accepting the new version. Even though they are dependencies, you are ultimately responsible for using them. "They are just dependencies" doesn't absolve you of responsibility.
Also, one of the login links takes you to a 404 page: https://triplechecker.com/s/jDTmQa/txtlog.net
> Most of my development career has been with C#, NodeJS and PHP
and then
> The architecture is very scalable, but I've only tested up to a few billion logs.
Out of curiosity, can you describe how your service is better than others?
>I hope my application fixes most of these issues
Do you care to elaborate on the "how"?