• zX41ZdbW 6 days ago |
    How did it go through peer review without a comparison with ClickHouse?

    > Our analysis shows that the Compact layout performs better when Null ratio is high and the Placeholder layout is better when the Null ratio is low or the data is serial-correlated.

    ClickHouse uses a placeholder value with a separate stream with NULL-masks, and additionally, it has the Sparse column format, which is named Compact in the paper (but currently, the Sparse format applies to encode default values more efficiently rather than NULL values).

    • jnordwick 6 days ago |
      kdb+ isn't in there either, and that is more important than CH I think. KDB is boring, just uses a placeholder. I think it might be because these two do it in a boring fashion.
      • mhuffman 6 days ago |
        >and that is more important than CH I think.

        If measured by $$$$$ clickhouse certainly has more installations.

        • jnordwick a day ago |
          CH definitely has more installations. not sure about which pulls in more revenue. KDB installations will run you $250,000/yr on the low-end for just the software license. Not sure how that compares.
    • xyzzy_plugh 13 hours ago |
      Is there documentation for the ClickHouse native binary format somewhere? Parquet and ORC are standalone formats. This is a strange comparison to demand.

      The paper is addressing the abstract techniques and is not a benchmark of various implementations. It seems to me that ClickHouse's design is already represented.

      You're the CTO of ClickHouse. How's your relationship with Pavlo and McKinney?

  • mwexler 15 hours ago |
    While the other authors are from Tsinghua University, two more recognizable names include Wes McKinney of Pandas and Apache Arrow fame and Andy Pavlo at CMU, who has done some fun work on columnar stores and database optimization.

    Always fun to see the mix of authors globally linking up.