But it's good enough and I guess it compiles quick which was a major goal for golang.
It's not at all weird that the language authors needed assembly to implement such a thing. They figured out the tricky bits so you don't have to.
Interpreters will literally switch on the type of things in order to figure out what to do with the value. They've lost the side channel battle before it even began. Compilers? Who knows what sort of code they will generate? Who knows how many of your precautions they will delete in an effort to "optimize"? Libsodium has its own memory zeroing function because compilers were "optimizing" the usage of the standard ones.
If you're writing anything cryptography related, you probably want to be talking directly to the processor which will be running your code. And only after you've studied the entire manual. Because even CPUs have significant gaps in the properties they guarantee and the conditions they guarantee them in.
Cryptographers might even consider lowering the level even further. They might want to consider building their own cryptoprocessor that works exactly like they want it to work. Especially if you need to guarantee things like "it's impossible to copy keys and secrets". I own three yubikeys for the sole purpose of guaranteeing this.
In crypto code branching is exactly what you don't want to do to guarantee security. Branches go both ways if an attacker can force a mispeculate and microarchitectural state is not rolled back because it can't be.
Currently the most reliable way to get exactly the assembly you want is to write the assembly you want.
Any chance you can explain that to rubes like me?
The presented examples are just a straight distinct assembly language associated with the golang ecosystem used in their own dedicated source files called via a FFI. This is comparable to just writing a pure assembly file and linking it into your program which is actually a much more reasonable thing to do than the insanity of inline assembly.
The problems being highlighted are just cases of people who do not understand ABIs and ABI compatibility. This is extremely common when crossing a language boundary due to abstraction mismatches and is made worse, not better, by doing even more magic behind the scenes to implicitly paper over the mismatches.
[1] github.com/minio/c2goasm (no longer updated)
So, the vast majority of code out there in the wild?
i.e. take a look at the glibc implementation of 'strcmp` [0]
[0] https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/m...
I know... I write GPU assembly for a living... And still I make that wager. It's not a lot. It's not even a little. It's an epsilon (overall). And it gets smaller over time.
> conventions are stupid
all conventions are stupid when examined through the lens of an isolated island dweller. you might as well be saying something like "you only need to drive on the left-hand side of the road when you're driving on public roads".
Not all registers encode as operands equivalently (implicit rdx:rax, implicit [rbx+al], limited [rbp/r13+imm8]). Some have other encoding restrictions or special purposes (rdi, rsi, rcx). When segmentation was a thing there were different default segments for each. Some are destroyed when certain opcodes used (syscall: rcx, r11).
> So many wasted bytes on moving values between registers [...] Modern CPUs are fast
Well, they've special cased this anyways, as these will often be caught in the rename stage and not even occupy an execution slot. Since we've long recognized that passing these values in registers instead of the stack is far more efficient, which is why the `fastcall` convention came about and got it's name way back in the x86 days.
> but there's still tons of inefficiency in compiler output.
Which is also why the 'inline' heuristic exists. In which case all of the calling conventions are fully abandoned. I mean, things like ELF dynamic symbol tables, and linux thread local storage annoy me far more than calling conventions ever have.
They still need to be fetched and decoded, and take up space in caches and RAM that could be used for more purposeful instructions.
Which is also why the 'inline' heuristic exists.
Inlining has its own problems too.
I mean, things like ELF dynamic symbol tables, and linux thread local storage annoy me far more than calling conventions ever have.
Don't get me started on the whole ELF and dynamic linking situation...
Frame pointers are a great example. Having a well-known and generic representation of %rbp is helpful when you go to use or integrate with existing tools like debuggers, link editors, or (say) most of the existing LLVM/GCC/whatever toolchain. Or when you want to expose a stable ABI to consumers for whatever reason (as, e.g., the Linux kernel famously does). Or, or, or.
I think it's reasonable to say this has been mostly uncontroversial since at least the 90s. The discussion has changed a bit since (apparently) Go needed none of these things to succeed—not the LLVM compiler toolchain infrastructure, and also not the user-facing things like the debuggers. To hear Russ Cox tell the tale, this is mostly because they required flexibility, and I suppose they were right, since they did rewrite their linker 3 times, and sure enough, 15 years later, in the year of our lord 2024, most debugging in Go seems to happen by writing ASCII-shaped bytes to some kind of a file, somewhere, and then using the world's most expensive full text search engine to get those bytes so you can physically read them on a screen. A debugger does seem limited in use for that specific workflow, so maybe that was the right call, who knows.
Anyway, I don't think there's ever been any real doubt that something like LLVM imposes a serious integration cost, but now that we have Go, the discussion has mostly shifted to "is it worth it", and seemingly the answer is "mostly yes" since nearly everyone building a new and hip native language uses LLVM or something like it. Every language is different, YMMV, etc., but I personally don't hear a lot of complaining about what a bummer it is that all these tools work pleasantly together instead of secretly sabotaging each other by loading up FP with whatever cursed data Go wanted to use it for. And why would they?
What is more mysterious to me is how an actual assembly programmer came to defend Go's stance on ... assembly. Perhaps I'm the only one who reads these things, but at various points in the ASM docs[1] (which I will heretofore call "Mr Pike's wild ride") the author expresses a view that I think is reasonably well-described as "a tempered but pretty much open contempt for the practice as a whole". cf.,
> Instructions, registers, and assembler directives are always in UPPER CASE to remind you that assembly programming is a fraught endeavor. (Exception: the g register renaming on ARM.)
Even if your feelings are hard to hurt though, if you ever crack open the toolchain and attempt to read the golang kind-of-IL-kind-of-x86, it is hard to walk away thinking "these people really get me and my profession". DI and FP are both normal registers! It uses the unicode interpunct instead of the plain old dot operator! It uses NIL instead of NULL! It is one thing to say calling conventions are stupid, but it's another thing entirely to give a great big hug to an almost-but-not-quite-assembly-code that is convenient neither for humans to type nor for tools to consume.