We use a lot of async Rust internally, and created this library out of a need for an async-aware concurrent hashmap since there weren’t many available in the Rust ecosystem.
Whirlwind is a sharded HashMap with a fully asynchronous API. Just as dashmap is a replacement for std::sync::RwLock<HashMap>, whirlwind aims to be a replacement for tokio::sync::RwLock<HashMap>. It has a similar design and performance characteristics to dashmap, but seems to perform better in read-heavy workloads with tokio's green threading.
Benchmarks are in the readme! We used an asyncified version of dashmap's benchmark suite. The project is in a pretty early stage and I'm sure there are flaws, but I'm pretty happy with the performance.
There is some unsafe involved, but we run Miri in ci to (hopefully) catch undefined behavior well before it's in an actual release.
We'd appreciate any feedback! Thanks in advance :)
I'm curious about the practical benefits of handling a HashMap with an async interface. My long-standing understanding is that `tokio::sync::RwLock<HashMap>` is only useful when you'd want to hold the lock guard across another `await` operation; but when the lock guard is not held across an await, it is always preferable to use the synchronous version.
This would lead me to assume that same applies for dashmap--it should be sufficient for async use cases and doesn't need an async API unless we expect to be blocked, but the benchmarks indicate that whirlwind outperforms dashmap in various situations. Do you have a sense of where this blocking occurs in dashmap?
In all honesty I was quite surprised by the benchmarks as well though, I wouldn't expect that much performance gain, but in high-contention scenarios it definitely makes sense.
Immediately waking itself, the task is scheduled and will be polled again shortly. This creates the loop in which is checks if the lock is available. This has no effective difference compared to using hint::spin_loop() but in async.
A more traditional lock will use a queue in which the task will not be polled until it's likely available, rather than blindly trying on repeat.
- attempt to acquire an RWLock without blocking
- if it does, wake the task waiting for the lock
- if it doesn't, return "not ready yet"
The RWLock is straight from the standard lib; there's no loop and no spinning involved. Every single Future you look at will look like this, by specification they cannot block, only return "ready" (and wake the task), or return "not ready".
It really is not much different to a thread blocking on a loop, where the OS thread scheduler will pre-empt the thread, rescheduling it for later. In this case it's an async task instead of a thread, and cooperative instead of preemptive, but it's still unfair and not very effective use of scheduler resources
That's not true. A Future is supposed to schedule itself to be woken up again when it's ready. This Future schedules it to be woken immediately. Most runtimes, like Tokio, will put a Future that acts like this at the end of the run queue, so in practice it's not as egregious. However, it's unquestionably a spin lock, equivalent to back off with thread::yield.
[1] https://github.com/tokio-rs/loom [2] https://github.com/awslabs/shuttle
It would be good to see a benchmark where it only touches a _single_ key. If Whirlwind is still fast than the others I would be far more convinced to use it unconditionally.
EDIT: I also see that you're benchmarking on a M3 Max. Note that this has 6 performance cores and 6 efficiency cores. This means that if you're running at <6 cores it will most likely start out running it at the efficiency core. From my experience it's quite important to do warmup phases in order to get stable results at low thread count. And even then I find it hard to reason about the results since you're running in mixed set of cores…
Regarding your edit: damn I hadn't thought of that. I'll rerun the benchmarks on my Linux desktop with a Ryzen chip and update the readme.
There might be some fairness concerns here? Since we're polling the lock (instead of adding ourselves to a queue) it could be the case that some requests are constantly "too late" to acquire it? Could be interesting to see the min/max/median/P99 of the requests themselves. It seems that Bustle only reports the average latency[1] which honestly doesn't tell us much more than the throughputs.
[1]: https://docs.rs/bustle/latest/bustle/struct.Measurement.html
Fair point though, there would definitely be some benefit to having some of these things in the stdlib.
Although I guess they do implement Future on their own so it shouldn't need a specific runtime then.
https://github.com/fortress-build/whirlwind/blob/main/src/sh...
https://github.com/fortress-build/whirlwind/blob/main/Cargo....
Rather than having 3-5 ways to do a say a GUI from the stdlib you have 0 and instead you can pick a dependency that suits your way.
That then boils down to even data structures, there's a bunch of trade-offs to be made where one is better than the other and if Rust had to maintain 30 different types of maps for each use case that seems like a waste of their time. Push that work off to the people that need those maps.
Now after on some pondering though, there is a strength in that the Rust team doesn't have to focus on maintaining such a large stdlib, as you said. I find myself using external libraries for many things in Go already, so why not just extend that to slightly more fundamental things with lots of variance too.
I think the largest downside with this approach, however, is that it's hard for me to know what the de facto, collectively-agreed-upon standard is for a given task.
Naming things is... well, you know...
When will this happen? I imagine a lot of people who want use it might just forget about it if you say "do not use this until some unspecified point in the future".
Is this actually the case? I can't say I've seen the same.
A second reason is sharding - sharding based on a hash is quite simple to do, but sharding an ordered collection would be quite difficult since some reads would need to search across multiple shards and thus take multiple locks.
If you mean internally (like for each shard), we're using hashbrown's raw HashTable API because it allows us to manage hashing entirely ourselves, and avoid recomputing the hash when determining the shard and looking up a key within a shard.
[0]: https://github.com/fortress-build/whirlwind/blob/0e4ae5a2aba...
[0]: https://github.com/tokio-rs/tokio/blob/8897885425bf3d89053f8... [1]: https://github.com/tokio-rs/tokio/blob/8897885425bf3d89053f8...
In the common case, the waker side will operate some logic like:
set_flag();
atomic_waker.wake();
and the future will run: atomic_waker.register(cx.waker());
if flag_is_set() {
Poll::Ready(())
} else {
Poll::Pending
}
Thus, even if `register` decides to “spin”, the flag will already be set, and so the future will not spin in reality (it may be polled unnecessarily, but this will only happen once).I can’t immediately think of examples where support for spurious waking is necessary.
and, if we want to keep it as a spinlock, I'm curious how much the immediate wakeup compares to using `tokio::task::yield_now`: https://docs.rs/tokio/latest/tokio/task/fn.yield_now.html