Lessons learned from a successful Rust rewrite
145 points by broken_broken_ 8 days ago | 42 comments
  • steveklabnik 8 days ago |
    Incidentally, the first code sample can work, you just need to use the new raw syntax, or addr_of_mut on older Rusts:

        fn main() {
            let mut x = 1;
            unsafe {
                let a = &raw mut x;
                let b = &raw mut x;
        
                *a = 2;
                *b = 3;
            }
        }
    
    The issue is that the way that the code was before, you'd be creating a temporary &mut T to a location where a pointer already exists. This new syntax gives you a way to create a *mut T without the intermediate &mut T.

    That said, this doesn't mean that the pain is invalid; unsafe Rust is tricky. But at least in this case, the fix isn't too bad.

    • daghamm 8 days ago |
      Every time I think I am getting good at Rust, the core team adds a new keyword to send me back to square one.

      (All right, i don't actually know if this is old or new. But you get my point?

      • steveklabnik 8 days ago |
        This was stabilized thirteen days ago, in Rust 1.82.0. addr_of_mut! was stabilized three and a half years ago, in Rust 1.51.0.

        > you get my point?

        I don't think Rust adds keywords very often. But I can acknowledge this is a subjective point. Additionally, in this case, it is literally a one sentence explanation: `&raw mut` is how you can create a `*mut T`, and `&raw` is how you can create a `*const T`. That's it. You can safely ignore this whole thing until you're writing some unsafe code. It doesn't feel like a large burden to me, though of course I am biased.

        • daghamm 8 days ago |
          You are biased, more complex Rust means more book sales :)

          Joke aside, my main issue with Rust is that is is already more complicated than C++. And it is still growing.

          • steveklabnik 8 days ago |
            I don't think Rust is anywhere near as complex as C++.

            I agree that it seems like the team has an appetite for change that's larger than I personally would agree is appropriate. We'll see what they end up shipping.

          • orf 8 days ago |
            In what ways is it more complicated than C++?
            • daghamm 7 days ago |
              The basic C++ you need to know to survive is quite small. Not as small as C, but let's say twice as large. There are some crazy complex parts in C++ but most people dont see those parts very often.

              Rust on the other hand has a pretty huge base part that everyone needs to learn. Because certain normally simple things can get very complex in Rust and you start approaching the dark corners of the language very very quickly.

              So yes, I belive Rust is more complicated that C++. And it is still growing

              • orf 7 days ago |
                Interesting - I see your point, but in my experience the complexity most people reference in this context are things that are implicitly present in C++, like lifetimes.

                That being said they do definitely make it hard to do certain types of things (data structures with cyclic references, etc). But that’s not quite complexity, that’s more of a constraint on expressing complexity

                To me complexity comes from implicit lifetimes, operator overloading, turing-complete templates, unhygienic macros, lack of a blessed package manager/build system, many different types of implicit constructors/destructors used in different contexts, easy to hit language footguns, etc.

                IMO lifetimes (and downstream interactions with things like async) are the most complex bit of Rust.

                But for a fairly significant number of places in Rust projects you can avoid them. The same isn’t true for the complexities of C++ I listed above.

      • rectang 8 days ago |
        I thought myself a pretty decent C coder, certainly someone who was conscientious and took safety seriously. I am nevertheless often humbled when programming unsafe Rust as I discover aspects and errors I had not anticipated or thought through. I don't attribute this to Rust, but instead to the deceptively difficult problem domain.
        • WesolyKubeczek 8 days ago |
          Are things you have learned so far transferable back to C? Can you with your new experience consider yourself a better C coder?
          • ChristianJacobs 8 days ago |
            Not the parent poster, but in my experience - yes.

            I mainly program in C++, and the past 5 years or so of writing Rust has drastically changed the way I write code (in general). Sometimes, if the problem is particularly hairy, I'll even write the code in a similar way in Rust first and then port it back to C++ after the compiler has helped me iron out my mistakes.

        • vacuity 7 days ago |
          > but instead to the deceptively difficult problem domain.

          It's definitely a big part of it, but I think sometimes it is a Rust thing. Of course, motivated by the essential complexity, but accidental complexity is still complexity. I don't think or hope it's an existential issue for Rust, which is good. The sharp edges around pointer-reference nuances in unsafe, for instance, are being addressed by things like raw references (mentioned upthread). I think languages that explore this problem domain more might provide interesting alternative designs for what Rust handles through unsafe.

  • ubj 8 days ago |
    I've recently seen a lot of Rust rewrite projects that have talked about how much they've been required to use unsafe blocks. I'm currently in process of my first C++-to-Rust rewrite, and I haven't needed to reach for unsafe at all yet.

    What kinds of projects or C++ features are requiring such high usage of unsafe? I'm not implying that this is bad or unnecessary--I'm genuinely curious as to what requires unsafe to be used so frequently. Since by all accounts unsafe Rust can be harder to use than C++, this may help inform me as to whether I attempt using Rust in future rewrites.

    • lsbehe 8 days ago |
      He mentioned FFI into and out of his code, which has been my main encounter with unsafe rust too. Often enough I could limit the use to the entry/exit code but that's not always possible.
    • steveklabnik 8 days ago |
      I agree that in my experience, little unsafe is needed. However, (from an earlier article in this series):

      > This project is a library that exposes a C API but the implementation is C++, and it vendors C libraries (e.g. mbedtls) which we build from source. The final artifacts are a `libfoo.a` static library and a `libfoo.h` C header.

      In this case, this project is doing a lot of FFI, both exposing C, as well as calling into C libraries. That's unsafe. Which is a good example of a project that may use unsafe more than the average Rust project.

      • ubj 8 days ago |
        Ah, that makes sense. Thanks for the clarification.
      • kelnos 8 days ago |
        My feeling was that this rewrite is not actually "done". Sure, all their C++ has been converted to Rust, but it seems like there's a lot of unsafe that they could rewrite in safe Rust.

        And for the C libraries they vendor in, assuming none of them are exposed directly in their public API, it's likely they can replace them with Rust libraries with equivalent behavior. mbedtls seems like a good example of that; certainly it wouldn't be a small effort to switch to rustls, but it might be worth it to do so. And even if they didn't choose to do that, I just did a quick search on crates.io, and it looks like there are safe wrappers for mbedtls.

    • myworkinisgood 7 days ago |
      Anything where you are doing massive and complex parallelism will require tonne of unsafe. Think co-operative thread groups on 120x4 = 480 core machines.
  • mmastrac 8 days ago |
    This post is subtly wrong: "multiple read-only pointers XOR one mutable pointer" is actually "multiple read-only references XOR one mutable reference".

    It _is_ valid to have multiple mutable pointers, just as C and C++ allow. It's when you have multiple live, mutable references (including pointers created from live mutable references) that you end up in UB territory.

    • GrantMoyer 8 days ago |
      Smaller follow up, it's multiple read-only references NAND one mutable reference, since it's also safe to have no references.
    • vacuity 7 days ago |
      They might mean in the sense of data races, which can happen in other languages with pointers. In unsafe code, Rust exchanges that (kinda) for upholding Rust's reference rules, which is its own flavor of UB. Although if you have long-lived raw pointers, the data races come back too. Free UB!
  • happyweasel 8 days ago |
    The only real comparison would be a rewrite in modern c++ and then compare that to the rewrite in rust. Also the author mentioned that the original code had no tests at all. Well, good luck.
    • jerf 8 days ago |
      Well, the great Software Engineering As Science conundrum is that nobody can afford to run studies like that, let alone enough of them to get some sort of statistically significant sample.

      So we just have to do our best.

  • hyperman1 8 days ago |
    As someone who likes what Rust brings to the table, I am pleasantly surprised with the honesty of this review.

    Interfacing with the C world, both as caller and calllee, happens a lot in real world code. All the C bugs come right back at that point.

  • WhatIsDukkha 8 days ago |
    This seems like a weird use of Rust.

    There is no mention of how much of the codebase is even in safe Rust after all this work so no clear value to the migration?

    Frequently when people get their code ported they then begin a process of reducing the unsafe surface area but not here.

    The author seems to have little or no value on safe Rust? It doesn't seem evident from reading/skimming his 4 articles on the process.

    Interesting mechanical bits to read for sure though so it' still a useful read more broadly.

    It's unsurprising that the author would go use Zig next time since they didn't seem to have any value alignment with Rust's core safety guarantees.

    • empath75 8 days ago |
      I don't really understand why they chose to rewrite this in rust if they're just going to spend their time writing unsafe C code in rust.
      • hermanradtke 8 days ago |
        Unsafe Rust is much safer than C
        • bsder 8 days ago |
          > Unsafe Rust is much safer than C

          That is not at all an obvious axiom.

          I am willing to concede that "Rust" is safer than "C".

          However, in "unsafe Rust" it is super easy to violate a Rust API precondition that the compiler takes advantage of. Even the Rust intelligentsia have pointed out that writing correct "unsafe Rust" is significantly harder than writing correct C.

    • whatshisface 8 days ago |
      >Doing an incremental rewrite from C/C++ to Rust, we had to use a lot of raw pointers and unsafe{} blocks. And even when segregating these to the entry point of the library, they proved to be a big pain in the neck.

      That's how the project ended up in such a bad state by the end. Instead of having rust-rust linkages that the compiler could check, they designed every function boundary to be an uncheckable rust-* linkage. This would be like porting a C library to C++, but only moving one function at a time, such that every single function had to comply with extern C.

      Here is an important warning:

      The difficulty of setting up boundaries to unsafe languages are the hidden reason for people wanting to rewrite every C library in Rust. Do not choose a design pattern that requires more than one of these boundaries to exist within your own code!

    • bsder 8 days ago |
      > It's unsurprising that the author would go use Zig next time since they didn't seem to have any value alignment with Rust's core safety guarantees.

      I don't think that's true. The end application talks to smart cards. See also one of the links: https://gaultier.github.io/blog/how_to_rewrite_a_cpp_codebas...

      However, the codebase has the possibility to use multiple memory allocators, and Rust is simply actively bad when faced with that.

      It just seems like the codebase has a set of idioms that really lean into the areas that Zig is actively good at and where Rust is weak. Memory allocators, lexical (not RAII) defer, C interop, C++ interop, and cross-compilation are Zig's raison d'être, after all.

      The one thing I disagree with is the complaint about "repr(C)". Sorry, but I've become convinced that if we want our compiler and languages to work well on modern hardware, we're going to have to allow the compilers to do lots of struct-of-array to array-of-struct (and vice versa) transformations depending upon the actual access patterns. That means that a struct or an array will not be locked to a specific memory layout unless you specifically request as such.

  • tharne 8 days ago |
    That's a confusing title. I was under the impression that on Hacker news, every Rust rewrite is a successful rewrite.
    • layer8 8 days ago |
      They are just emphasizing the tautology, not sure why you are confused. ;)
      • tharne 8 days ago |
        It was just a joke, and perhaps a bad one, poking fun at the propensity of Rust fans to insist on rewriting anything and everything in Rust :)
  • kelnos 8 days ago |
    I feel like some of the "what didn't go so well" sections were essentially because their rewrite was incomplete:

    > I am still chasing Undefined Behavior. Doing an incremental rewrite from C/C++ to Rust, we had to use a lot of raw pointers and unsafe{} blocks. And even when segregating these to the entry point of the library, they proved to be a big pain in the neck.

    These sound like an artifact of the rewrite itself, and I suspect many of these unsafe blocks can be rewritten safely now that there is no C++ code left.

    > I am talking about code that Miri cannot run, period: [some code that calls OpenSSL (mbedtls?) directly]

    This should be replaced by a safe OpenSSL (mbedtls?) wrapper, or if it wouldn't change the behavior of their library in incompatible ways, rustls.

    > I am still chasing memory leaks. Our library offers a C API, something like this: [init()/release() C memory management pattern]

    Not sure what this has to do with Rust, though. Yes, if you're going to test your library using the exposed C API interface, your tests may have memory leaks. And yes, if your users are expected to use the library using the C API, they will have to be just as careful about memory as they were before.

    The benefit of this rewrite in Rust would be about them not misusing memory internally inside the library. If that benefit isn't useful enough, then they shouldn't have done this rewrite.

    > Cross-compilation does not always work

    I've certainly run into issues with cross-compilation with Rust, but it is always so much easier than with C/C++.

    > Cbindgen does not always work. [...] Every time, I thought of dumping cbindgen and writing all of the C prototypes by hand. I think it would have been simpler in the end.

    I'm skeptical of the idea that an automated tool is going to generate something that you'll want to use as your public API. I would probably use cbindgen to get a first draft of the API, modify and clean up the output, and use that as the first version, and then manually add/change things from there as the API changes.

    I don't want to silently, accidentally change the API (or worse, ABI) of my library because a code generator changed behavior in a subtle way based on either me upgrading it, or me changing my code in a seemingly-innocuous way.

    > Unstable ABI

    This is a bummer, but consider that they are not exposing a Rust API to their customers: they're exposing a C API. Why would the expect to be able to expose Rust types through the API?

    And they actually can do this: while it is correct that standard Rust types could have a different layout depending on what version of rustc is used to build it, that doesn't actually matter for a pre-built, distributed binary, as long as access to those types from the outside code (that is, through the C API) is done only through accessors/functions and never through direct struct member access. Sure, that requires some overhead, but I would argue that you should never expose struct/object internals in your public API anyway.

  • showsomerespect 8 days ago |
    The "arena allocator" hyperlink links to localhost:8000
  • nesarkvechnep 8 days ago |
    What do we expect from someone who says C/C++?
  • IshKebab 8 days ago |
    Hmm yeah I'm not surprised that interfacing safe Rust with an existing unsafe C/C++ API is painful. That's really true in every language. (Although I haven't tried Zig tbf.)

    I'm also not totally convinced that rewrite from scratch is always the wrong thing. For small projects the total work rewriting can be much less than dealing with this kind of FFI.

  • pjmlp 7 days ago |
    > Many, many hours of hair pulling would be avoided if Rust and C++ adopted, like C, a stable ABI.

    What people mistakenly take for the C ABI, is in reality the OS ABI when written in C.

    Two C binary libraries might fail to link, or reveal strange behaviours/crashes, when compiled with different C compilers, for anything that isn't clearly defined as part of the OS ABI.