Static Basic Block Versioning

48 points by matt_d 6 days ago | 11 comments

082349872349872 6 days ago |
> how to ensure the convergence of the algorithm when the specializations of the basic blocks are not based on profiled variable values
sounds like a good use for PGO? (profile guided optimisation)
vkazanov 6 days ago |
Most modern jit compilers are using live profile data, especially the original basic block versioning technique which this article is based on.
This particular algorithm tries to get rid of some the problems of the relatively simple PGO-like approach.
Btw, authors report strikingly good results.
DannyBee 6 days ago |
That's because the authors are reporting results for AOT scheme to c compilers.
For languages more directly supported by llvm/gcc, they already do various kinds of function versioning and specialization.
You would be highly unlikely to get 10% there from a magic bullet like this - in the end, there are no magic bullets. Just lots of hard work.
voxl 6 days ago |
Mmm, my impression is you can get 90% there with significantly less effort then LLVM or GCC. A good register allocation strategy, some basic inlining, maybe peephole optimization? LLVM is fighting for 5-10% improvements on top of the big things
vkazanov 6 days ago |
The key here is that this is meant to be used for dynamic languages that don't have a lot of typing info when doing AOT compilation
quotemstr 6 days ago |
Or an ecosystem-wide shift away from AOT compilation.
I feel like I'm living in that "one guy against huge crowd" meme. Everyone else is making more AOT, and all I can say, usually under my breath, is "Wait! Stop! You're giving up something valuable! JIT and GC are actually good!"
(I know you can have AOT and GC, but people usually lump AOT and !GC together.)
AOT optimizes startup performance. I haven't seen it optimize other metrics. Psychologically, we often gauge overall performance by startup performance, but it ain't necessarily so. I keep expecting the pendulum to swing from "AOT and manual memory management" back to "JIT and garbage collection", because I believe the latter has a much higher performance ceiling once we apply sufficient optimization elbow grease. I've been patient and I'll remain patient.
But still. PGO is just a hack around the lack of a JIT, just like RCU is a hack around the lack of a GC, and I think it's due to well-known psychological biases that we haven't embraced more dynamism.
How do we solve the startup problems with JIT systems? Better image support. Nobody has really cracked this nut, but I think it's important. Every GraalVM language should support, e.g., resuming from a heap dump.
TristanBall 6 days ago |
Any time I'm doing lego-brick engineering, aka unix style "small tools pipe together", then startup time impacts me most while I'm actively developing the new system.
That process is iterative, usually on a subset of the data, and slow starting tools make my day qualitatively worse.
The final performance matters sometimes, but more often than not it doesn't. 30 minutes or 45 for a batch job? That runs from cron? I don't usually care.
I'm never going to favour slow starting tools unless some aspect of their functionality pays for that cost, or I'm choice constrained, but final peak performance is almost never going to be a factor.
So to me it's not psychological bias, it's economics, and the currency is time, when it's me personally paying it. Semantics maybe, but just writing it of as bias isn't going to help me get things done.
quotemstr 6 days ago |
Startup time is a problem we can solve without giving up the advantages of dynamic systems. We just haven't invested enough in solving the problem.
vkazanov 5 days ago |
This is just not correct.
Dynamic systems like the ones based on javascript or Java or scheme have received A LOT of attention in the last 10-20 years. In fact, javascript does start fairy quickly.
The usual solution to the startup problem is to have many compilers working at once (aka tiered jit compilation). A fast bytecode interpreter + mid-level jit + ssa-based heavy stuff, all working in parallel. Obviously, this means a lot of background compute and memory burned.
mshockwave 6 days ago |
I'm actually glad to see that restriction in their study, because there are many cases where you just can't / more difficult to use PGO. One of the most common reasons I ran into is that clients refuse to use it (no, I'm not even joking), either they don't know how to come up with a good training data (in which case they'll point at your nose shouting IT DOESN'T WORK!!) or they thought it's a stupid idea (again, not joking). You'll be amazed by some people's stubbornness
Lerc 6 days ago |
I stumbled upon Maxime Chevalier-Boisvert's work ages ago and the BBV just seemed to make so much sense.
Nice to see it being adopted more broadly.