Size Optimization Tricks (2022)
90 points by Narishma 5 days ago | 29 comments
  • nokeya 3 days ago |
    All this assembly tricks are interesting but not really useful in real production code. Part about the struct packing is quite common and used but I don’t understand why these extra paddings are present and why one is gone. Even with pragma pack 8, should not it be only one padding?
    • stevekemp 3 days ago |
      I suspect it depends upon your niche/field, there are certainly times when I've done assembly coding for production. Although I admit these days most of my assembly-coding is for retro-uses.
      • kabdib 3 days ago |
        i work on embedded systems, and this stuff does matter. it costs real money when you are buying the chips and selling hardware
    • Just_Harry 3 days ago |
      > I don’t understand why these extra paddings are present and why one is gone. Even with pragma pack 8, should not it be only one padding?

      If you're referring to `__pad2` in the example, that trailing padding is there to ensure that the size of the struct is a multiple of its alignment, which is 8, so that if there's a contiguous span of those structures, each instance after the first one will remained properly aligned. Without `__pad2`, that struct would be 36-bytes, which would cause every other instance in an array/contiguous-span to be aligned on 4 bytes instead of 8.

    • bluetomcat 3 days ago |
      > but I don’t understand why these extra paddings are present and why one is gone

      In the example from the article, the "s_arc" member is a pointer (8 bytes) and requires an alignment at an 8-byte boundary. The ints are 4 bytes in size. The whole struct needs to be aligned on an 8-byte boundary, in order to preserve the alignment requirements of the pointers. The trailing "s_accept" member requires additional padding to make the size of the struct divisible by 8, and that would preserve the alignment of an eventual second adjacent struct, when having an array of these structs.

    • vardump 3 days ago |
      One day you might have to patch a binary with a hex editor. For reason or another.
      • tjalfi 3 days ago |
        At my last job, I had to patch a binary for one of our internal .NET applications. The application was hardcoded to connect to a specific database, but we needed it to work with a different one. Since the original developer was unavailable, I disassembled the application, updated the configuration, and then reassembled it.
        • mystified5016 2 days ago |
          .net is the very obvious exception to this kind of thing, yes. .net very deliberately makes this kind of tampering quite trivial in comparison
  • bsenftner 3 days ago |
    Justine Tunney just blows my mind, I think she is one of the most important software developers / computer scientists alive today.
    • james-bcn 3 days ago |
      It's all extremely clever, but is it useful?
      • bsenftner 3 days ago |
        You're kidding, right? You don't see the value?
        • xmodem 3 days ago |
          Personally I see the value in exploring the limits of what systems are capable of, and exploring ways to use them outside of the parameters for which they were designed.

          I would also generally like to avoid being on-call for a system that is being pushed to its limits or used outside the parameters it was designed for.

          I am very curious to hear if anyone is shipping cosmopolitan-libc/Actually-portable-executable binaries, either internally or for consumption by end users. I would love to hear more about the experience!

      • devnullbrain 3 days ago |
        Small binaries can improve performance, it's not just data that needs caching.
      • widdershins 3 days ago |
        The article explains the author's justification right at the very top, so you can decide if the given reasons apply to you or not.
      • zoenolan 3 days ago |
        One of my previous jobs involved coding on a media processor. That processor had a direct-mapped cache, so code size and layout mattered. Ideally, you wanted the performance-critical code to fit in the cache and be in different cache lines to avoid thrashing.
      • tjalfi 3 days ago |
        As a concrete example, Carlos Bueno's Mature Optimization Handbook[0] describes how the HHVM team got substantial performance wins by reducing instruction cache misses in rarely executed code.

        [0] https://carlos.bueno.org/optimization/

    • loxias 3 days ago |
      Knuth, Bellard, Tunney.
    • johndough 3 days ago |
      Not to belittle Justine's achievements, but the role of the most important software developer probably goes to the maintainer of some hugely important infrastructure project that we barely know about.

      https://xkcd.com/2347/

      If Justine didn't optimize struct padding, binaries would be a bit larger, but software would keep working. However, if a trivial library like left-pad is gone, it triggers global chaos of such monumental proportions that it warrants its own Wikipedia article https://en.wikipedia.org/wiki/Npm_left-pad_incident

      Or there might be some unsung hero responsible for fixing a year 2038 bug in a bunch of ICBMs who prevented worldwide nuclear annihilation (or who caused it, if you have a more pessimistic view of the future).

      • bsenftner 3 days ago |
        She's created a compatibility layer enabling portability of a huge amount of software between operating systems, which will enable a huge number of other developers a path into those operating systems and the hardware they are running.
  • secondcoming 3 days ago |
    Here's one person trying to reduce binary bloat, while there's also a push to statically link everything. Who will win?

    Alos, don't go rearranging the members of your structs if they're made public to third-parties!

    • bieganski 3 days ago |
      > while there's also a push to statically link everything

      could you elaborate? who is pushing and what?

      dynamic linking has it pitfalls, often it's a pain, but it has big big profits as well.

      • dspillett 3 days ago |
        > > while there's also a push to statically link everything

        > could you elaborate? who is pushing and what?

        I don't know of a more general movement, but Go developers seem very eager/proud about the single-binary thing. It can make deployments, particularly updates, much less issue prone.

        • makapuf 3 days ago |
          Well, if you're deploying in a container where the only useful userspace program is your http server web API, embedding the whole clib and cpplib just for a few functions, it is smaller and simpler to deploy to use static linking.
          • dspillett 2 days ago |
            Aye, a container with the binary and the right versions of the supporting libraries is essentially static linking with extra steps.

            Containers offer some tooling for resource management and such, though that is basically wrappers and other syntactic sugar dressing up OS facilities like resource groups so isn't anything you can't do with a statically linked binary too.

      • pajko 3 days ago |
        And dynamic libraries can be prelinked to regain some performance: https://linux.die.net/man/8/prelink
  • nokeya 3 days ago |
    Do anyone know any tools (static or dynamic analysers) to automatically detect structures/classes they may be reordered to improve their packing? I think it can be quite useful or at least interesting
    • ComputerGuru 3 days ago |
      clang-tidy has an extension to do just that: https://clang.llvm.org/extra/clang-tidy/checks/altera/struct... -- I think there are other clang-tidy padding-related checks that apply here as well, iirc.

      Also, not what you asked but certainly related, some lower-language levels (not just jit languages) can and do automatically re-arrange struct members (when not optionally fixed); e.g. by default rust will rearrange members as needed to optimize padding.

    • pajko 3 days ago |
      Bolt can do quite a lot of optimizations: https://github.com/llvm/llvm-project/tree/main/bolt
    • throw16180339 3 days ago |
      clang has -Wpadded to warn about added alignment padding, but it's really noisy.