Don't Clobber the Frame Pointer
77 points by felixge 8 days ago | 41 comments
  • malkia 6 days ago |
    GoLang assembly boggles my mind - I understand why it's there, but having looked at it few times makes me wonder if it could've been prevented somehow (I guess not, cryptographic primitives would be way too slow, redirecting them through some kind of ffi would require a shared lib, yada yada yada)...
    • nimish 6 days ago |
      They could have added crypto primitives via intrinsic, or had some other way of including the edge case functionality it solves.

      But it's good enough and I guess it compiles quick which was a major goal for golang.

      • tptacek 6 days ago |
        Major Rust cryptography libraries (see for instance Ring) use assembly, too. It's a pretty normal thing to do.
        • kibwen 6 days ago |
          It's kinda weird that languages (or at least languages with pretensions to cryptography) are still forcing people to resort to asm directly rather than offering some sort of first-class support for constant time operations and not leaving secrets lying around in memory. It doesn't need to be super high level, it just needs to clear the infinitely low bar of assembly language. Does any language offer such a dedicated facility?
          • dpifke 6 days ago |
            Go does have first class support for this in the standard library, e.g. https://pkg.go.dev/crypto/subtle.

            It's not at all weird that the language authors needed assembly to implement such a thing. They figured out the tricky bits so you don't have to.

          • matheusmoreira 6 days ago |
            As someone who's made a very simple language, I would say there are far too many moving parts involved to guarantee anything of the sort. It's probably better to just integrate libsodium.

            Interpreters will literally switch on the type of things in order to figure out what to do with the value. They've lost the side channel battle before it even began. Compilers? Who knows what sort of code they will generate? Who knows how many of your precautions they will delete in an effort to "optimize"? Libsodium has its own memory zeroing function because compilers were "optimizing" the usage of the standard ones.

            If you're writing anything cryptography related, you probably want to be talking directly to the processor which will be running your code. And only after you've studied the entire manual. Because even CPUs have significant gaps in the properties they guarantee and the conditions they guarantee them in.

            Cryptographers might even consider lowering the level even further. They might want to consider building their own cryptoprocessor that works exactly like they want it to work. Especially if you need to guarantee things like "it's impossible to copy keys and secrets". I own three yubikeys for the sole purpose of guaranteeing this.

            • wbl 6 days ago |
              Interpreters don't need to have dynamic typing: for example the JVM and the interpreters before JIT. Even with dynamic types there are some spectacularly clever tricks people use: Smalltalk VMs are where they were invented and practiced a bunch.

              In crypto code branching is exactly what you don't want to do to guarantee security. Branches go both ways if an attacker can force a mispeculate and microarchitectural state is not rolled back because it can't be.

          • wbl 6 days ago |
            That's not why cryptographers use assembly. We use assembly because performance often requires instructions the complier will never use that the CPU maker makes for us. Intrinsics invite all sorts of spilling issues and aren't quite as good.
          • the8472 6 days ago |
            Most crypto wants constant-time execution, optimizing compilers are not designed with that in mind. They have optimization passes that will happily turn your carefully crafted constant-time code back into branches when their heuristics deem that profitable.

            Currently the most reliable way to get exactly the assembly you want is to write the assembly you want.

        • nimish 6 days ago |
          Sure but golang has its own special assembly flavor rather than using standard gcc flavor inline assembly. Probably because it's a soup to nuts compiler but still.
          • tptacek 6 days ago |
            The point of this article is that Go-specific assembler generators (Avo in particular) are better than standard assembly for this purpose.
            • nimish 6 days ago |
              That doesn't preclude syntactic compatibility, does it?
          • citizenpaul 6 days ago |
            >soup to nuts compiler

            Any chance you can explain that to rubes like me?

            • tptacek 6 days ago |
              Go isn't built on an existing compiler framework like LLVM. It does its own code generation, has its own assembler.
              • nwokon 6 days ago |
                There is an accident of history here. Go was developed with the plan 9 C compiler suite as a starting point. Most notably those compilers did not generate assembler -- they emitted object code directly. This is described here: https://9p.io/sys/doc/compiler.html. The assembler facilitated transforming hand-written assembly to object code. And here the plan 9 folks chose a new syntax, probably because it was simpler to start afresh over using the existing "AT&T" or "Intel" syntax.
                • nimish 5 days ago |
                  Typical plan 9, change for the sake of change. Second system effect writ large.
          • Veserv 6 days ago |
            This is not at all comparable to inline assembly which interleaves two different languages into the same source file.

            The presented examples are just a straight distinct assembly language associated with the golang ecosystem used in their own dedicated source files called via a FFI. This is comparable to just writing a pure assembly file and linking it into your program which is actually a much more reasonable thing to do than the insanity of inline assembly.

            The problems being highlighted are just cases of people who do not understand ABIs and ABI compatibility. This is extremely common when crossing a language boundary due to abstraction mismatches and is made worse, not better, by doing even more magic behind the scenes to implicitly paper over the mismatches.

    • kristianp 6 days ago |
      Intrinsics would be a great quality of life improvement for low-level optimisations. They don't require understanding register allocation, but obviously they would add complexity to the compiler and they aren't cross-architecture. I have tried some tools that convert a C function with intrinsics to Go assembly, but they were buggy for my use case [1],[2].

      [1] github.com/minio/c2goasm (no longer updated)

      [2] https://github.com/gorse-io/goat

  • userbinator 6 days ago |
    Speaking as an Asm programmer for several decades: Calling conventions are stupid. They are the results of mindless stupid-compiler-oriented thinking from a time when compilers produced horrible copy-paste-replace code. The CPU itself couldn't care less which registers you use for what. So many wasted bytes on moving values between registers, just because the calling convention wanted it there, and no other reason. The only need to pay attention to calling conventions is when you're interfacing with compiler-generated code. Modern CPUs are fast, but there's still tons of inefficiency in compiler output.
    • lmz 6 days ago |
      > The only need to pay attention to calling conventions is when you're interfacing with compiler-generated code.

      So, the vast majority of code out there in the wild?

      • almostgotcaught 6 days ago |
        vast majority doesn't even begin to describe it - i would wager 10 years of my salary that the fraction of all currently running CPU instructions that were handwritten is so small that it's within the margin of error (i.e., random bit flips) for whatever computer you use to perform the count.
        • RandomBK 6 days ago |
          Depending on how you count, the ratio might not be that small. A lot of hot code are written in hand-coded inline assembly, so in terms of CPU cycles run it's probably non-negligible.

          i.e. take a look at the glibc implementation of 'strcmp` [0]

          [0] https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/m...

          • almostgotcaught 6 days ago |
            > A lot of hot code are written in hand-coded inline assembly

            I know... I write GPU assembly for a living... And still I make that wager. It's not a lot. It's not even a little. It's an epsilon (overall). And it gets smaller over time.

          • lionkor 6 days ago |
            Now how much of that doesn't interface with compiler generated code?
      • userbinator 6 days ago |
        I mean it only matters at the interface.
      • Spivak 6 days ago |
        If you're not interfacing with it, say linking it as a library, then it doesn't matter what you do.
        • lmz 5 days ago |
          Sure, but this sort of limits the kinds of thing you can realistically build unless you want to build everything from the ground up. Even in the case of code reuse with statically linked assembly files there would be some sort of "convention" about how to call and be called.
    • almostgotcaught 6 days ago |
      do people think this is insightful? do you?

      > conventions are stupid

      all conventions are stupid when examined through the lens of an isolated island dweller. you might as well be saying something like "you only need to drive on the left-hand side of the road when you're driving on public roads".

      • userbinator 6 days ago |
        Compilers were stupid, and that's how we ended up with this constant overhead of inefficiency long after they could've done better; it's only within the last decade or so that "custom calling conventions" started being even considered.
        • almostgotcaught 6 days ago |
          Calling conventions have no more to do with dumb or smart compilers than driving on the left-hand side of the road has to do with dumb or smart urban planners.
          • userbinator 6 days ago |
            Of course they do. An Asm programmer will naturally use the appropriate registers to minimise data movement (see also: PC BIOS interface - no stupid stack shit) depending on the circumstances, a stupid compiler will just push everything on the stack. A more intelligent compiler will behave more like the human programmer and decide how to pass parameters and save or restore registers on a case-by-case basis.
    • timewizard 6 days ago |
      > The CPU itself couldn't care less which registers you use for what.

      Not all registers encode as operands equivalently (implicit rdx:rax, implicit [rbx+al], limited [rbp/r13+imm8]). Some have other encoding restrictions or special purposes (rdi, rsi, rcx). When segmentation was a thing there were different default segments for each. Some are destroyed when certain opcodes used (syscall: rcx, r11).

      > So many wasted bytes on moving values between registers [...] Modern CPUs are fast

      Well, they've special cased this anyways, as these will often be caught in the rename stage and not even occupy an execution slot. Since we've long recognized that passing these values in registers instead of the stack is far more efficient, which is why the `fastcall` convention came about and got it's name way back in the x86 days.

      > but there's still tons of inefficiency in compiler output.

      Which is also why the 'inline' heuristic exists. In which case all of the calling conventions are fully abandoned. I mean, things like ELF dynamic symbol tables, and linux thread local storage annoy me far more than calling conventions ever have.

      • userbinator 5 days ago |
        Well, they've special cased this anyways, as these will often be caught in the rename stage and not even occupy an execution slot

        They still need to be fetched and decoded, and take up space in caches and RAM that could be used for more purposeful instructions.

        Which is also why the 'inline' heuristic exists.

        Inlining has its own problems too.

        I mean, things like ELF dynamic symbol tables, and linux thread local storage annoy me far more than calling conventions ever have.

        Don't get me started on the whole ELF and dynamic linking situation...

    • antics 6 days ago |
      Since no one seems to be pushing back I'll add my 2¢ here as a former compilers engineer. Calling conventions are just like any other style guide. Yes, any particular coding style is stupid, but it's still useful to have one that you are more or less committed to, especially if everyone else in the ecosystem is committed to it too.

      Frame pointers are a great example. Having a well-known and generic representation of %rbp is helpful when you go to use or integrate with existing tools like debuggers, link editors, or (say) most of the existing LLVM/GCC/whatever toolchain. Or when you want to expose a stable ABI to consumers for whatever reason (as, e.g., the Linux kernel famously does). Or, or, or.

      I think it's reasonable to say this has been mostly uncontroversial since at least the 90s. The discussion has changed a bit since (apparently) Go needed none of these things to succeed—not the LLVM compiler toolchain infrastructure, and also not the user-facing things like the debuggers. To hear Russ Cox tell the tale, this is mostly because they required flexibility, and I suppose they were right, since they did rewrite their linker 3 times, and sure enough, 15 years later, in the year of our lord 2024, most debugging in Go seems to happen by writing ASCII-shaped bytes to some kind of a file, somewhere, and then using the world's most expensive full text search engine to get those bytes so you can physically read them on a screen. A debugger does seem limited in use for that specific workflow, so maybe that was the right call, who knows.

      Anyway, I don't think there's ever been any real doubt that something like LLVM imposes a serious integration cost, but now that we have Go, the discussion has mostly shifted to "is it worth it", and seemingly the answer is "mostly yes" since nearly everyone building a new and hip native language uses LLVM or something like it. Every language is different, YMMV, etc., but I personally don't hear a lot of complaining about what a bummer it is that all these tools work pleasantly together instead of secretly sabotaging each other by loading up FP with whatever cursed data Go wanted to use it for. And why would they?

      What is more mysterious to me is how an actual assembly programmer came to defend Go's stance on ... assembly. Perhaps I'm the only one who reads these things, but at various points in the ASM docs[1] (which I will heretofore call "Mr Pike's wild ride") the author expresses a view that I think is reasonably well-described as "a tempered but pretty much open contempt for the practice as a whole". cf.,

      > Instructions, registers, and assembler directives are always in UPPER CASE to remind you that assembly programming is a fraught endeavor. (Exception: the g register renaming on ARM.)

      Even if your feelings are hard to hurt though, if you ever crack open the toolchain and attempt to read the golang kind-of-IL-kind-of-x86, it is hard to walk away thinking "these people really get me and my profession". DI and FP are both normal registers! It uses the unicode interpunct instead of the plain old dot operator! It uses NIL instead of NULL! It is one thing to say calling conventions are stupid, but it's another thing entirely to give a great big hug to an almost-but-not-quite-assembly-code that is convenient neither for humans to type nor for tools to consume.

      [1]: https://go.dev/doc/asm

    • x-shadowban 5 days ago |
      for functions that don't escape the current compilation unit (`static` functions, anonymous namespace functions), can/do compilers ignore calling conventions and do the faster thing? Of course, they can just inline, and that makes this moot.
  • nj5rq 5 days ago |
    That color scheme is nice for my eyes.