struct MyForeignPtr(*mut c_void);
impl Drop for MyForeignPtr {
fn drop(&mut self) {
unsafe { my_free_func(self.0); }
}
}
Then wrap the foreign pointer with MyForeignPtr as soon as it crosses the FFI boundary into your Rust code, and only ever access the raw pointer via this wrapper object. Don't pass the raw pointer around.So, an extended version if your solution would be to initialize the static parts on the stack using `MaybeUninit`, and `assume_init` on that after the call to get an `OwningArrayC<T>` out, which then can have a Drop impl defined for it.
EDIT: can this be done/automated using macros?
If you've been doing C for five decades, it's a shame not to have noticed that it's totally fine to pass a NULL pointer to free().
FWIW, I don't otherwise agreed with the thesis. I've written probably ten FFI wrappers around C libraries now, and in every case I was able to store C pointers in some Rust struct or other, where I could free them in a Drop implementation.
I also think it's not actually that unusual for C allocators (other than the truly ancient malloc(3C) family) to require you to pass the allocation size back to the free routine. Often this size is static, so you just use sizeof on the thing you're freeing, but when it's not, you keep track of it yourself. This avoids the need for the allocator to do something like intersperse capacity hints before the pointers it returns.
> The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs.
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#p...
50 years ago non standard compliance was the norm, even today it is quite common (to use non compliant extensions)
and there is also stuff like caches, arena allocations, jemalloc etc. which might not be linked against libc/free and might require manual free function usage, external APIs providing a their own free/destruction functions is really really normal
Which runtimes would this be?
[0] https://lists.gnu.org/archive/html/bug-gnulib/2020-12/msg001...
Seriously, those jobs typically don't pay more than peanuts anyhow.
if you did 50 years of development and haven't realized that in projects crossing language (or just separately complied C libs) you have to free things where you create them and can't use the many many libraries which do require you to use their own free/destruction functions and don't even have understand that free is flat and in turn you need to also "destruct" each item in a struct manually and that if you don't have a custom function for this it can easily become a huge bug prone mess
.. then I'm really not sure how this happened tbh.
but if you are not a senior C/C++/FFI developer maybe not a C/C++/FFI developer at all it's not hard to see how you might have missed this things tbh.
Or that calling one's FFI function inside of an `assert` means it will be compiled out if the `NDEBUG` macro is defined.
No, you are not; simple as that. Miri is right. Rust using malloc/free behind the scenes is an internal implementation detail you are not supposed to rely on. Rust used to use a completely different memory allocator, and this code would have crashed at runtime if it were still the case. Since when is undocumented information obtained from strace a stable API?
It's not like you can rely on Rust references and C pointers being identical in the ABI either, but the sample in the post blithely conflates them.
> It might be a bit surprising to a pure Rust developer given the Vec guarantees, but since the C side could pass anything, we must be defensive.
This is just masking bugs that otherwise could have been caught by sanitizers. Better to leave it out.
"Currently the default global allocator is unspecified. Libraries, however, like cdylibs and staticlibs are guaranteed to use the System by default.", however:
"[std::alloc::System] is based on malloc on Unix platforms and HeapAlloc on Windows, plus related functions. However, it is not valid to mix use of the backing system allocator with System, as this implementation may include extra work, such as to serve alignment requests greater than the alignment provided directly by the backing system allocator."
https://doc.rust-lang.org/std/alloc/index.html https://doc.rust-lang.org/std/alloc/struct.System.html
Surely the system allocator provides memalign() or similar? Does Windows not have one of those?
I don't think fixing that mess was ever a priority for Microsoft because it's mainly an issue for C, and their focus has long been on C++ instead. C++ new/delete knows the alignment of the type being allocated for so it can dispatch to the appropriate path automatically with no overhead in the common <=16 byte case.
And the same is true in reverse: rust struct might be temporarily owned and used by C code, but it should always be destroyed from rust.
A C library returning a pointer to allocated memory and then expecting the caller to free that memory with a function outside the library (like calling stdlib free()) is just bad API design (because you can't and shouldn't need to know whether the library is actually using the stdlib alloc functions under the hood - or whether the library has been linked with the same C stdlib than your own code - for instance when the library resides in a DLL, or the library might decide to bypass malloc and directly use lower-level OS calls for allocating memory).
If you have a 'create' function in a C library, also always have a matching 'destroy' function.
On top of that it's also usually a good idea to let the library user override things like memory allocation or file IO functions.
...and of course 'general purpose' global allocators are a bad idea to begin with :)
And on the same side of the API. Especially in Rust, he who allocates is responsible for deallocation. Rust's model likes that symmetry.
This example looks like someone going to considerable trouble to create a footgun, then shooting themself with it. More likely, this example exists because they're using some C library with terrible memory semantics, and this is a simple example to illustrate a problem they had with a badly designed real library.
You absolutely can rely on that: https://doc.rust-lang.org/reference/type-layout.html#pointer...
In fact, it's almost even best practice to do so, because it reduces the unsafe needed on Rust's side. E.g. if you want to pass a nullable pointer to T, declare it as e.g. a parameter x: Option<&T> on the Rust side and now can you do fully safe pattern matching as usual to handle the null case.
Want to use some Rust type (not even repr(C)) from C? Just do something like this:
#[no_mangle] pub extern "C" fn Foo_new() -> Box<Foo> { Box::new(Foo::new()) }
#[no_mangle] pub extern "C" fn Foo_free(_: Box<Foo>) { /* automatic cleanup */ }
#[no_mangle] pub extern "C" fn Foo_do_something(this: &mut Foo) { this.do_something() }
all fully safe. Unfortunately for the OP, Vec<_> isn't FFI safe, so that's still one of those cases that are on the rough side.
> The layout of a type is its size, alignment, and the relative offsets of its fields.
So “layout” only constrains the memory addresses where a value and its constituents are stored; it does not cover niches. Pointers are allowed to be null, references are not. There is talk of making it illegal to have a reference be unaligned, or even point to very low addresses: <https://github.com/rust-lang/rfcs/pull/3204>. At one point, there was even talk of certain kinds of references not even being stored as memory addresses at all: <https://github.com/rust-lang/rfcs/pull/2040>. And Box<_> is not #[repr(transparent)] either. To put it more generally, the only semantics of a Box<_> or a reference is that it grants you access to a value of a given type and is inter-convertible with a pointer. Only *const _ and *mut _ have a guaranteed ABI.
Just because you write fewer “unsafe” keywords does not mean your code is more safe.
Some abstractions are meant to be treated as a black box. It's bloody trivial to realize and point out that references are represented as pointers internally. The actually intelligent step is to figure out how to use that knowledge correctly. Do we pass small bittable types around by-value because we skip the dereference? Hell yes (pending profiling). Do we treat references as pointers? No.
As a plausible scenario, rustc could use tagged pointers for some purpose in refs. Your code could seem to be working in tests, but segfault when rustc sets the MSB for some reason.
https://doc.rust-lang.org/nightly/std/primitive.fn.html#abi-...
> The following types are guaranteed to be ABI-compatible:
> *const T, *mut T, &T, &mut T, Box<T> (specifically, only Box<T, Global>), and NonNull<T> are all ABI-compatible with each other for all T.
Honestly, this seems like something where the docs need to be clarified, but it's definitely intended that, if you transmute a reference into a pointer (explicitly, or implicitly through FFI as documented in that link), then you get a valid pointer to the type, not just an unknown blob that happens to have the same size and alignment. Likewise, any _valid_ pointer can be transmuted to a reference. It is already UB for a reference to be null or unaligned, but those are both cases of invalid pointers. (Note: rustc only uses null as a niche right now, not unaligned values, but unaligned values still cause UB because rustc tells LLVM's optimizer to assume alignment.)
If you want more evidence, here is a test for the improper_ctypes lint:
https://github.com/rust-lang/rust/blob/b91a3a05609a46f73d23e...
You can see that it's quite conservative about what can be passed through extern "C" functions without a warning, but &[u8; 4 as usize] is allowed, as is TransparentRef (a repr(transparent) struct containing a reference).
It is also guaranteed that you can transmute between pointers and optional references (i.e. Option<&T>), which are the same except that they do allow null:
https://github.com/rust-lang/rust/pull/60300
Thus, if you are writing FFI bindings for a C library, you are allowed to just declare pointer arguments as references, or as optional references, as long as the semantics line up. In practice this is not terribly common (outside of fn() types, which are treated like references), but it is allowed.
The RFC you cited as "certain kinds of references not even being stored as memory addresses at all" was rejected as a breaking change. If it had been adopted, it would have violated even the more-strict interpretation of the rule that pointers and references have the same layout, because it would have changed the size of references to zero-sized types (to zero).
That only works when calling into C code, and even then it assumes the C code implements allocation using malloc()/free(), with proxies if alternative memory allocators are used.
This behaviour can't be relied upon, but for many programs glueing themselves together with C libraries that's not stopping anyone. This is why I'm not a big fan of sharing pointers between languages (or even libraries, to be honest); I'd much prefer intermediate identifiers (file descriptors, Windows HANDLEs) to track instances that pass in and out of a library, although sometimes those cause too much overhead.
Either the caller is responsible and the function just uses the provided buffers/structures, or there is a clear convention that the caller has to call a cleanup function on the returned buffers/structures.
Mixing allocation and deallocation responsibilities between caller and callee almost always leads to pain and sorrow, in my experience.
Of course, if you're using a shared library or similar which can be written in something else entirely, mixing is a complete no-go as GP mentions.
1: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.int...
2) Do not use Vec<T> to pass arrays to C. Use boxed slices. Do not even try to allocate a Vec and free a Box... how can it even work?!
3) free(ptr) can be called even in ptr is NULL
4) ... I frankly stopped reading. I will never know if Rust actually needs Go's defer or not.
Too many words wasted online on this, instead of working on an equivalent ecosystem to replace UNIX/C which currently runs the world, and will continue do to so until the Rust community (or another community which is more organised and less populated by drama queens) gets their shit together.
Every community of advocates has extremists in it. The idea that everyone who uses rust is a weirdo is itself pretty strange.
Most people who use language x are like most people for all languages x which have reached any sort of critical mass.
Rust is not even my primary or secondary language. Instead of knee-jerk reaction, maybe consider that I'm just advocating for 1. using languages correctly (asking for defer is a clear indication of the opposite) and 2. using modern and expressive languages that solve entire classes of problems for good instead of struggling with cognitive complexity of doing everything by hand (which is what C and Go are about).
Option<Box<T>> is allowed too, and it's a nullable pointer. Similarly &mut T is supported.
Using C-compatible Rust types in FFI function declarations can remove a lot of boilerplate. Unfortunately, Vec and slices aren't one of them.
Unsafe Rust is indeed very hard to write correctly. Rustonomicon is a good start to learn unsafe Rust.
In these cases I can imagine the caller passing in a pointer/reference to uninitialised stack memory, which is also UB in the last version if the allocating code! A `&mut T` must always point to a valid `T` and must not point to uninitialised memory.
It seems to me like it'd be best to take a `&mut MaybeUninit<T>` parameter instead, and write through that. A further upside is that now if the caller _is_ Rust code, you can use MaybeUninit to reserve the stack space for the `OwningArrayC<T>` and then after the FFI call you can use `assume_init` to move an owned, initialised `OwningArrayC<T>` out of the `MaybeUninit` and get all of Rust's usual automatic `Drop` guarantees: This is the defer you wanted all along.
It _really_ isn't, it's actually exactly how C (or C++) works if you have library allocating something for you you also need to use that library to free it as especially in context of linked libraries you can't be sure how something was allocated, if it used some arena, if maybe some of you libraries use jemalloc and others do not etc. So it's IMHO a very basic 101 of using external C/C++ libraries fact fully unrelated to rust (through I guess it is common for this to be not thought well).
Also it's normal even if everything uses the same alloc because of the following point:
> So now I am confused, am I allowed to free() the Vec<T>'s pointer directly or not?
no and again not rust specific, free is always a flat freeing `Vec<T>'s` aren't guaranteed to be flat (depending on T), and even if, some languages have small vec optimizations (through rust I thing guarantees that it's not done with `Vec` even in the future for FFI compatibility reasons)
so the get to go solution for most FFI languages boundaries (not rust specific) is you create "C external" (here rust) types in their language, hand out pointer which sometimes are opaque (C doesn't know the layout) and then _hand them back for cleanup_ cleaning them up in their language.
i.e. you would have e.g. a `drop_vec_u8` extern C function which just does "create vec from ptr and drop it" (which should get compiled to just a free in case of e.g. `Vec<u8>` but will also properly work for `Vec<MyComplexType>`.
> Box::from_raw(foos);
:wut_emoji:??
in many languages memory objects are tagged and treating one type as another in context of allocations is always a hazard, this can even happen to you in C/C++ in some cases (e.g. certain arena allocators)
again this is more a missing knowledge in context of cross language C FFI in general then rust specific (maybe someone should write a "Generic Cross Language C FFI" knowledge web book, I mean while IMHO it is basic/foundational knowledge it is very often not thought well at all)
> OwningArrayC > defer!{ super::MYLIB_free_foos(&mut foos); }
the issue here isn't rust missing defer or goto or the borrow checker, but trying to write C in rust while OwningArrayC as used in the blog is a overlap of anti-patterns wanting to use but not use rust memory management at the same time in a inconsistent way
If you want to "free something except not if it has been moved" rust has a mechanic for it: `Drop`. I.e. the most fundamental parts of rust (memory) resource management.
If you want to attach drop behavior to an existing type there is a well known pattern called drop guard, i.e. a wrapper type impl Drop i.e. `struct Guard(OwnedArrayC); impl Drop for Guard {...} maybe also impl DerefMut for Guard`. (Or `Guard(Option<..>)` or `Guard(&mut ...)` etc. depending on needs, like e.g. wanting to be able to move it out conveniently).
In rust it is a huge anti pattern to have a guard for a resource and not needing to access the resource through the guard (through you will have to do it sometimes) as it often conflicts with borrow checker and for RAII like languages in general is more error prone. Which is also why `scopeguard` provides a guard which wrapps the data you need to cleanup. That is if you use `scopeguard::guard` and similar instead of `scopeguard::defer!` macro which is for convenience when the cleanup is on global state. I.e. you can use `guard(foos, |foos| super::MYLIB_free_foos(&mut foos))` instead of deferr and it would work just fin.
Through also there is a design issue with super::MYLIB_free_foos(&mut foos) itself. If you want `OwningArrayC` to actually (in rust terms) own the array then passing `&mut foos` is a problem as after the function returns you still have foos with a dangling pointer. So again it shows that there is a the way `OwningArrayC` is done is like trying to both use and not use rusts memory management mechanics at the same time (also who owns the allocation of OwningArrayC itself is not clear in this APIs).
I can give following recommendations (outside of using `guard`):
- if Vec doesn't get modified use `Box<[T]>` instead
- if vec is always accessed through rust consider passing a `Box<Vec<T>>` around instead and always converting to/from `Box<Vec<T>>`/`&Vec<T>`/`&mut Vec<T>`, Box/&/&mut have some defactor memory repr compatibilities with pointer so you can directly place them in a ffi boundary (I think it's guaranteed for Box/&/&mut T and de-facto for `Option<Box/&/&mut T>` (nullpointer == None)
- if that is performance wise non desirable and you can't pass something like `OnwingArrayC` by value either specify that the caller always should (stack) allocate the `OnwingArrayC` itself then only use `OnwingArrayC` at the boundary i.e. directly convert it to `Vec<T>` as needed (through this can easily be less clear about `Vec<T>`, and `&mut Vec<T>` and `&mut [T]` dereferenced)
- In general if `OwningArrayC` is just for passing parameter bundles with a convention of it always being stack allocated by the caller then you also really should only use it for the transfer of the parameters and not automatic resource management, i.e. you should directly convert it to `Vec` at the boundary (and maybe in some edge cases use scopeguard::guard, but then converting it to a Vec is likely faster/easier to do). Also specify exactly what you do with the callee owned pointer in `OwningArrayC` i.e. do we always treat it ass dangling even if there are errors, do we set it to empty vec /no capacity as part of conversion to Vec and it's only moved if that was done etc. Also write a `From<&mut OwningArrayC> for Vec` impl, I recommend setting cap+len to zero in it).
And yes FFI across languages is _always_ hard, good teaching material often missing and in Rust can be even harder as you have to comply with C soundness on one side and Rust soundness on the other (but I mean also true for Python,Java etc.). Through not necessary for any of the problems in the article IMHO. And even if we just speak about C to C FFI of programs build separately the amount of subtle potentially silent food guns is pretty high (like all the issues in this article and more + a bunch of other issues of ABI incompatibility risks and potentially also linker time optimization related risk).
It's probably not quite the teaching material you seek, but it's something close to that I think.
I have a lot of FFI code paths that end up easier to understand if you use the C-style instead of the Rusty approach.
Readability and comp time should be important even in FFI code paths
I can only hope it 'infects' other languages.
What we want everywhere is block-scoped defer, right?
If we're going to have that kind of semi-manual resource management, anyway, it would be better to have labelled blocks and the corresponding defer-like statement that can reference them. Something like:
outer: {
...
inner: {
defer close(x) after outer;
}
}
Not always. Block-scoped defer makes allocating inside "if blocks" a pain.
The flip side is that function-scoped is a pain to implement in a non-GC language.
But I do recognize that the code in the post was a simplified example, and it's possible that the flexibility of `Vec` is actually used; perhaps elements are pushed into the `Vec` dynamically or something, and it would be inconvenient to simulate that with `libc::malloc` et al. But even then, in an environment that's not inherently memory starved, a viable approach might be to build up the data in a `Vec`, and then allocate a properly-sized region using `libc::malloc` and copy the data into it.
Another option might be to maintain something like a BTreeMap indexed by pointer on the Rust side, keeping track of the capacity there so it can be recovered on free.
> Rust + FFI is nasty and has a lot of friction.
...then I would say you have been living a little bit of a charmed life. FFI in most languages has improved beyond all recognition in the last 15 years. When I look at FFI in rust I can't believe how ergonomic and clean it is compared to say PerlXS (which was the old way of doing FFI in perl).A possible "defer" keyword has nothing to do with any of this, and will not help you.
Rust doesn't need "defer".