Consistently faster and smaller compressed bitmaps with Roaring (2016)

46 points by hambandit 7 days ago | 5 comments

o11c 9 hours ago |
Title is confusing: this is not about the original "Roaring", but an extension of it called "Roaring+Run".
Here, "bitmap" = "set of sometimes-compact integers". The "uncompressed" and several "rle" implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you're doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can't use variable-length integers now).
Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it only works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.
This paper adds a third representation for the inner node, the bsearch'able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).
Overall there's neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the "overall sparse but locally semi-dense" representation that uses fixed-size single-word bitsets, ...)
Drup 6 hours ago |
You seem well versed into that corner. Do you have a good (and reasonably complete) introduction/exploration for these memory-efficient data-structure for computation ?
I've been working on memory representation of algebraic data types quite a bit, and I've always wondered if we could combine them with succinct data-structures.
pncnmnp 7 hours ago |
Hey everyone, if you're looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.
skybrian 5 minutes ago |
That’s a good read, thanks! The history is interesting, but in practice, I’m wondering if there’s any reason not to skip it and just learn about roaring bitmaps?
softwaredoug 4 hours ago |
Roaring bitmaps are really useful for doing phrase search in search engines.
Basically you can find cases where one term is immediately before another by left shifting the right terms roaring encoded positions in all documents and bitwise-anding the similarly roaring-encoded payload of the preceding term. All with a highly compressed representation of term positions.
With something like numpy you can do this in a handful of logical operations in python.
https://softwaredoug.com/blog/2024/01/21/search-array-phrase...