Last time I checked they're working on improving Python performance instead (yes I know they forked it into Cinder, but they're trying to upstream their optimizations [0]). Which is very similar to what we're doing at Shopify.
Of course 100% of Instagram isn't in Python, I'm certain there's lots of supporting services in C++ etc, but AFAIK the Instagram "frontend" is still largely a Python/Django app.
The joke is that if Meta thought that replacing all the Python code they have with something else was worth it, they'd have done it already.
"Worth it" depends on both how much performance improvement you get, and how hard it is to replace. Did you consider maybe the rewriting effort is so humongous that it is not worth doing despite large performance improvements? Thus making the joke not funny at all...
But that's some silly engineer tunnel vision, squeezing the very last bit of performance out of a system isn't a goal in itself. You just need it to be efficient enough that it cost you significantly less to run that the amount of revenue it brings you.
I can bet you that moving off Python must have been pitched dozens and dozens of time by Meta engineers, but deemed not worth it, because execution speed isn't the only important characteristic.
So yes, I find it hilarious when HN commenters suggests companies should rewrite all their software into whatever is seen as the most performant one.
Instagram is presumably in the same position. Switching language is basically impossible once you have a certain amount of code. I'm sure they were aware of the performance issues with Python but they probably said "we'll worry about it later" when they were a small startup and now it's too late.
Hack is actually surprisingly pleasant, basically about the best language they could have made starting from PHP. (I know, that's damning with faint praise. But I actually mean this unironically. It has TypeScript vibes.)
IMHO Hack's best feature was native support for XHP... which (also unfortunately) isn't something PHP decided to take.
Is the Instagram stack Python? I doubt it, but stranger things have happened
I suspect it is actually some derivative of Apache, or Nginx. Something sensible
1. 10% performance improvement at Instagram could lead to many millions of revenue "instantly". It is not laughable at any company. 2. It won't be a 5000% performance improvement. Facebook uses its own fork of Python that is heavily optimized. Probably still far from C++, but you should be thinking about languages like Java when talking about performance.
"Better" is a very subjective term when discussing languages, and I hope such discussions can be more productive and meaningful.
https://github.com/facebookincubator/cinder/blob/cinder/3.8/...
Well, it really depends on whether that alternative is open to you, and at what cost.
So eg lots of machine learning code is held together by duct tape and Python. Most of the heavy lifting is done by Python modules implemented in (faster) non-Python languages.
The parts that remain in Python could potentially be sped up by migrating them, too. But that migration would likely not do too much for the overall performance, but still be pretty expensive to do (in terms of engineering effort).
For organisations in these kinds of situations, it makes a lot of sense to hope for / contribute to a faster Python. Especially if it's a drop-in replacement (like Python 3.12 is for 3.9).
What makes really me hopefully is actually JavaScript: on the face of it, JavaScript is actually about the worst language to have a fast implementation. But thanks to advances in clever compiler and interpreter techniques, JavaScript is one of the decently fast languages these days. Especially if you are willing to work in a restricted subset of the language for substantial parts of your code.
I'm hoping Python can benefit from similar efforts. Especially since they don't need to re-invent the wheel, but can learn from the earlier and ongoing JavaScript efforts.
(I myself made some tiny efforts for CPython performance and correctness. Some of them were even accepted into their repository.)
So likely the 5000% improvement is no longer possible because they already did multiple 10% improvements? I don't know how this counters the original point.
All clues point to FB going this route because they had too much code already in PHP, and not because the performance improvement would be small.
In any case, "facebook does it" is not a good argument that something is the right thing to do. Might be, might not be. FB isn't above wrong decisions. Else we should buy "real estate" in the metaverse.
This is one of those areas where out of process caching wins. In process caching has a nasty habit of putting freshly created objects into collections that have survived for days or hours, creating writes in the old generation and back references from old to new.
Going out of process makes it someone else’s problem. And if it’s a compiled language with no or a better GC, all the better.
Agreed. We have some facility for out of process caching (node local memcached), and I frequently have to argue with colleagues that it's generally preferable to in-process caching.
That would allow us to have ephemeral per-request heaps which are torn down after every request at once. In-request garbage collections are super-fast. Application-scoped objects are never collected (i.e. no major collections).
Wouldn't this simple model solve most problems? Basically, a very simple equivalent to Rust's lifetimes tailored to web services without all the complexity, and much less GC overhead than in traditional GC systems.
I ask because I have embedded Ruby in applications before, and I'm looking for an excuse to do it in Rust.
I've done this in both C and C++.
The downside of automatic memory management is you have to accept the decisions the memory manager makes.
Still, generational GC like in Ruby and Python essentially attempts to discern the lifetime of allocations, and it gets it right most of the time.
Actix, Axum, sqlx, diesel, and a whole host of other utilities and frameworks make writing Rust for HTTP just as easy and developer efficient as Golang or Java, but the code will never have to deal with GC.
It's easy to pull request scoped objects into durable caches.
This is starting to really become a problem in the observability space and async locals. Node.js for instance currently will keep async locals around for too long of a time because they are propagated everywhere. For instance if you call `console.log` in a promise you will leak an async local forever.
Next.js famously keeps around way too many async locals past the request boundary for caching related reasons.
A solution would be to have a trampoline to call things through that make it explicit that everything happening past that point is supposed to "detach" from the current flow. An allocator or a context local system can then use that information to change behavior.
In node you could use worker threads (which create a new V8 instance in a separate OS thread) but that's probably too heavy handed.
Although I'm not sure what the preferred language for quickly getting a startup up and running would be these days.
The language (or the rest of the stack even) is rarely a barrier to success. What matters are a good idea, good motivation, and decent availability of competence.
JMHO.
Per request arenas sound super cool on paper, and work very well on system with clear constraints. But if suddenly a request start allocating more than the arena can accommodate you're in a bit of a pickle. They're absolutely not a panacea.
Setting aside the challenge of refactoring the Ruby VM to allow this sort of arenas, they'd be a terrible fit for Shopify's monolith.
Ultimately, while it's a bit counter intuitive, GCs can perform extremely well in term of throughput. Ruby's GC isn't quite there yet, but still perform quite well and is improving every versions.
In Zig, at least, this isn't how arenas work. They're a wrapper around a backing allocator, so if the arena runs out of memory, then that means the process is out of memory, something no allocation strategy can fix (ignoring the fact that Zig returns a specific error when that happens, and maybe you can trigger some cache eviction or something like that).
It's easy to set them to retain a 'reasonable' allocated capacity when they get reset, for whatever value of reasonable, so big allocation spikes get actually freed, but normal use just moves a pointer back and reuses that memory.
I don't see Shopify harvesting a lot of value from a complete Zig rewrite, no. But arenas are basically ideal for the sort of memory use which web servers typically exhibit.
Being explicit about memory has many advantages, and is a strict requirement when scaling.
Well, yes, with a GC when your heap is full, you make space by getting rid of the garbage.
Also, with a good GC, allocating is most of the time just bumping a pointer, exactly like an arena, and the collection time is proportional to the number of live objects, which when triggered out of band is basically 0.
Hence why I think a well tuned GC really isn't that far off.
Now is a great time to pick up the language, but I would say that production is not the right place to do that for a programmer learning memory management for the first time. Right now we're late in the release cycle, so I'd download a nightly rather than use 0.13, if you wanted to try it out. Advent of Code is coming up, so that's an option.
Using a memory-managed language means you need to design a memory policy for the code. Zig's GeneralPurposeAllocator will catch use after free and double free in debug mode, but that can only create confidence in memory handling code if and when you can be sure that there aren't latent bugs waiting to trigger in production.
Arenas help with that a lot, because they reduce N allocations and frees to 1, for any given set of allocations. But one still has to make sure that the lifetime of allocations within the arena doesn't outlast the round, and you can only get that by design in Zig, lifetimes and ownership aren't part of the type system like they are in Rust. In practice, or I should say with practice, this is readily achievable.
At current levels of language maturity, small teams of experienced Zig developers can and do put servers into production with good results. But it's probably not time for larger teams to learn as they go and try the same thing.
It does matter for a company trying to scale its user base while keeping costs down.
I do want to pick on this specifically - people can and should be patching open source projects they depend on and deploying them to production (exactly as described in the article). Something being in the language vs in “user” code should be no barrier to improving it.
The latter is often orders of magnitude more work, and the existing solution is probably chosen to be well suited in general.
This is in essence another form of technical dept.
Also, I really want ZGC in .NET runtime, but I don't think I'll ever get support for it first party. There's some kind of principled ideologue holdout situation going on over at Microsoft. Every time I get into it with one of their engineers I'm sent to some impotent "please may I have a temporary GC exemption" API. All I want is it to do nothing. How hard is it to just not clean up the goddamn garbage? Give me a registry flag + env variable + cli arg all required at the same time if you're so worried someone might trip over it.
You could try using https://github.com/kkokosa/UpsilonGC and seeing if it still works.
At the end of the day for anything performance-related you can just write code with manual memory management with RAII patterns via IDisposable on structs and get code that performs closely to C++ or Rust. It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
Apologies - I was attempting to referring to "absolutely no" garbage collection path. I was thinking of Epsilon [0].
> It's also necessary to understand if this is a good idea at all - most of the time you do want to just rely on GC.
Assume we are building a cruise missile flight computer. I have enough ram for ~100 hours of flight if we never clean up any allocations. I only have enough fuel for 8 hours of flight on a good day. Why do I still need a garbage collector? All I need is a garbage generator. The terminal ballistics and warhead are the "out of band" aspects in this arrangement.
> You could try using https://github.com/kkokosa/UpsilonGC and seeing if it still works.
I've spent weeks on this exact thing. I cannot get it to work. This gets me back to the first party support aspect.
And yes, we're aware of ZGC &co https://www.eightbitraptor.com/presentations/RubyKaigi2023-m...
Good luck!
So could each request clean up it's own garbage when it finishes, so then they should never need any global garbage collection?
I don't think doing it after each request would be sensible, but counter intuitively, the time it takes to run GC isn't proportional to amount of garbage to collect, but to the number of live objects left (ignoring some minor things like finalizers).
So on paper at least we could run a minor GC for very cheap after each request, but there's likely some better heuristics given currently the median request already spent less than 1ms in GC, so after every requests might be overdoing it.
Also even if we were doing that, many requests would still have to run GC because they allocate more than there is memory available, so they need to clean their own garbage to continue, you can't delay GC indefinitely.
But at least now, endpoints that spend too much time in GC are responsible for their own demise, so engineers responsible for a given endpoint performance have a clear signal that they should allocate less, whereas before it could easily discounted as being caused by lots of garbage left over by another collocated endpoint.
Since objects cannot be promoted to the old generation inside the request cycle, objects in the new gen are request allocated objects.
So if we were to eagerly trigger a minor GC after a request, we'd have very little objects to scan, and only need to sweep garbage, which is only a small fraction of time spent in GC.
So I've spent a lot of time doing Hack (and PHP) as well as Java, Python and other languages. For me, as far as serving HTTP requests goes, Hack/PHP are almost the perfect language. Why?
1. A stateless functional core. There's no loading of large libraries, which is an issue with Python and Java in certain paradigms. The core API us just functions that mean startup costs for a non-stateful service are near zero;
2. The model, as alluded to the above quote, basically creates temporary objects and then tears everything down at the end of the request. It's so much more difficult to leak resources this way as opposed to, say, a stateful Java or C++ server. PHP got a lot of hate unjustly for its "global" scope when in fact it's not global at all. "Global" in PHP/Hack is simply request-scoped and pretty much every language offers request-scoping;
3. There's no threading. Hack, in particular, uses a cooperative async/await model. Where you'd normally create threads (eg making a network request), that's handled by the runtime to make an async/await call out of non-blocking I/O. You never have to deal with mutexes, thread starvation, thread pools, lock ups, etc. You never want to deal with that in "application" or "product" code. Never.
So this article is specific to Ruby-on-Rails, which obviously still has persistent objects, hence the need for GC still.
How Facebook deals with this is kinda interesting. Most FB product code uses an in-memory write-through graph database (called TAO, backed to MySQL). There is an entity model in Hack on top of this that does a whole bunch of stuff like enforcing privacy (ie you basically never talk to TAO directly and if you do, you're going to have to explain why that's necessary, and you absolutely never talk to MySQL directly).
But the point is that persistent entities are request-scoped as well (unlike RoR I guess?).