Flattening ASTs and other compiler data structures (2023)

99 points by aw1621107 13 hours ago | 18 comments

ww520 11 hours ago |
This is a fantastic idea. AST works well in an array based allocation block since it has no need for freeing individual nodes. It’s an add-only allocation.
dmagyari 11 hours ago |
"Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array." Zig compiler pipeline (AST, Zir, Air, Sema) does exactly this on all layers. Not only contiguous, but instead of array-of-structs it is struct-of-arrays, so walking the tree is even more cache friendly. For AST see: https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...
agumonkey 8 hours ago |
Makes me wonder if people in APL/J/K community have not been influenced or influencing this kind of technique. IIRC Aaron Hsu does tree processing through arrays (but i'm not skilled enough to analyze his code)
gsf_emergency an hour ago |
Do you have a link to such an example of Aaron's code? Thank you in advance!
gritzko an hour ago |
I work on a C dialect where everything is flattened. JSON and other trees in particular. Binary heaps are flat, merge sort and iterator heaps are absolutely great, can build LSM databases with that. Stacks, circular buffers, hash maps, etc, all flat. Templated output (PHP like) is done by a flat data structure.
https://github.com/gritzko/librdx/blob/master/abc/B.md
Apart from locality and lifetimes, these flat data structures improve composability. When every data structure is a flat buffer, you can mmap them or zip them or send them by the network, all by the same routine. They are uniform like bricks, in a sense.
cardanome 11 hours ago |
Amazing article, very good advice to keep your data structures flat.
Adding to that, it also makes editing the AST vastly more efficient.
I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.
emptysea 10 hours ago |
Rust-analyzer uses a similar technique for parsing https://github.com/rust-lang/rust-analyzer/blob/master/crate... which then gets fed into https://github.com/rust-analyzer/rowan (lossless syntax tree)
bradrn 8 hours ago |
There is a longer overview of it here: https://github.com/rust-lang/rust-analyzer/blob/master/docs/...
ndesaulniers 10 hours ago |
Cool! Carbon is doing exactly this. I had asked leads if there was a paper on this approach, but they didn't have anything for me. I'll send them this post!
benatkin 9 hours ago |
Zig uses a MultiArrayList which sounds similar https://mitchellh.com/zig/parser
kazinator 10 hours ago |
> Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array.
This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.
Don't make me bring out the L word for the billionth time.
> A flat array of Exprs can make it fun and easy to implement hash consing
OK, it's not a case of L-ignorance, just willful neglect.
samps 10 hours ago |
FWIW I did acknowledge this in the article:
> A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it
> Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.
hencq 8 hours ago |
As the article mentions, this makes it quite similar to a bytecode vm. I think the traditional wisdom is that an AST walker is easy to write, but for speed you'd want a bytecode interpreter. It'd be interesting to see how close the performance gets with this flattened AST.
In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.
kazinator 8 hours ago |
If you instead flatten the expression tree into RPN, then you can execute it like that, with a stack machine.
I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.
finnh 7 hours ago |
The sample flat program in the post is exactly RPN, no?
samps 5 hours ago |
I think it would be more like RPN if it used a stack, and operands were specified as relative offsets (i.e., stack offsets). In the version I wrote, operands are still represented as absolute offsets in the expression table.
dang 3 hours ago |
Related:
Flattening ASTs and other compiler data structures (2023) - https://news.ycombinator.com/item?id=42181603 - Nov 2024 (2 comments)
Flattening ASTs and other compiler data structures - https://news.ycombinator.com/item?id=36559346 - July 2023 (81 comments)
jgrowl 2 hours ago |
I thought a reddit comment on this article had an interesting point:
https://www.reddit.com/r/rust/comments/1d3b356/my_new_favori...
[–]Timzhy0 3 points 7 months ago
Btw I think one can go a step further than the author, there is no need to keep two explicit ExprRef baked in a binary node (lhs, rhs). You can exploit locality, basically the AST, seen it the LISP way, is just an arbitrarily nestable list, where elements are atoms or other lists. Hence all you need to know is where each list ends (and if it's an atom you can assume it spans one node) and actually one bit to know if it is the last entry in the list is quite ergonomic as well (because then you can distinguish whether moving next slot in the AST means there is a sibling). Basically it's easier to keep it sync while constructing and takes up less memory per node. I pay 40 bits per node, stored interleaved for best cache locality (some unaligned accesses but I think it's still worthwhile), 8 bits for the tag, 32 for the data, if data is bigger, 32 is an index into some auxiliary segment (basically a ptr).