I’m excited to share Kameo, a lightweight Rust library that helps you build fault-tolerant, distributed, and asynchronous actors. If you're working on distributed systems, microservices, or real-time applications, Kameo offers a simple yet powerful API for handling concurrency, panic recovery, and remote messaging between nodes.
Key Features:
- Async Rust: Each actor runs as a separate Tokio task, making concurrency management simple.
- Remote Messaging: Seamlessly send messages to actors across different nodes.
- Supervision and Fault Tolerance: Create self-healing systems with actor hierarchies.
- Backpressure Support: Supports bounded and unbounded mpsc messaging.
I built Kameo because I wanted a more intuitive, scalable solution for distributed Rust applications. I’d love feedback from the HN community and contributions from anyone interested in Rust and actor-based systems.
Check out the project on GitHub: https://github.com/tqwewe/kameo
Looking forward to hearing your thoughts!
There's a limitation mentioned in the docs:
While messages are processed sequentially within a single actor, Kameo allows for concurrent processing across multiple actors.
which is justified via This [sequential processing] model also ensures that messages are processed in the order they are received, which can be critical for maintaining consistency and correctness in certain applications.
I agree to this and it gives the library a well defined use.Docs and examples are well made.
It’s a feature.
EDIT: Tangent, but if anyone has experience making deterministic actor model systems that can be run under a property test I'd love to know more. It would make an amazing blog post even if it would have a very narrow audience
Ractor is nice, and I've used it in the past. A couple things differ between kameo and ractor:
- In ractor, messages must be defined in a single enum – in kameo, they can be separate structs each with their own `Message` implementation. This means messages can be implemented for multiple actors which can be quite useful.
- In ractor, the actor itself is not the state, meaning you must typically define two types per actor – in kameo, the actor itself is the state, which in my opinion simplifies things. As someone mentioned in a comment here, it was a bit of a turn off for me using ractor in the past and I didn't fully agree with this design decision
- Ractor requires the `#[async_trait]` macro – kameo does not.
There may be other obvious differences but I'm not super familiar with ractor besides these points
Imagine opening a socket, if you have a mutable self the caller who is going to spawn that actor needs to open the socket themselves and risk the failure there. Instead of the actor who would eventually be responsible for said socket. This is outlined in our docs the motivation for this. Essentially the actor is responsible for creation of their state and any risks associated with that.
- for the async trait point we actually do support the native async traits without the boxing magic macro. It's a feature you can disable if you so wish but it impacts factories since you can't then box traits with native future returns https://github.com/slawlor/ractor/pull/202
(For transparency I'm the author of ractor)
Here's the RustConf presentation for anyone interested https://slawlor.github.io/ractor/assets/rustconf2024_present...
I wasn't aware async_trait wasn't needed, thats nice to see.
Also congrats on it being used in such a big company, thats awesome! I have a lot of respect for ractor and appreciate your response
Question for you, I was poking around in the codebase and how do you handle your Signal priorities? Like if a link died, and there's 1000 messages in the queue already, or if it's a bounded mailbox, would the link died (or even stop) messages be delayed by that much?
Have you looked into prioritization of those messages such that it's not globally FIFO?
So gracefully shutting down an actor with `actor_ref.stop_gracefully().await` will process all pending messages before stopping. But the actor itself can be forcefully stopped with `actor_ref.kill()`
But sometimes when I see projects like this in other languages, I think, are you sure you don't want to use Erlang or something else on the BEAM runtime and just call Rust or C via their NIFs?
I used Erlang about a decade ago, and even then it was so robust, easy to use, and mature. Granted you have to offload anything performance-sensitive to native functions but the interface was straightforward.
In the Erlang community back then there were always legends about how Whatsapp had only 10 people and 40 servers to serve 1 Billion customers. Probably an exaggeration, but I could totally see it being true. That's how well thought out and robust it was.
Having said all that, I don't mean to diminish your accomplishment here. This is very cool!
BEAM's benefit 10-20 years ago where that inter-node communication was essentially the same as communicating in the same process. Meaning i could talk to an actor on a different machine the same way as if it was in the same process.
These days people just spin up more cores on one machine. Getting good performance out of multi-node erlang is a challenge and only really works if you can host all the servers on one rack to simulate a multi-core machine. The built in distributed part of erlang doesn't work so well in modern VPS / AWS setup, although some try.
Well, what I do is think of functions as services, and there are different ways to get that, but BEAM / OTP are surely among them.
I think most software won't need to scale that far. Did you run into any systems like that built on top of BEAM?
So, did you run into any systems that needed to scale to tens of thousands of cores for a reason inherent to the problem they were solving, and was built on top of BEAM?
Although Elixir is a nice language, I struggle to enjoy writing code in a language lacking types.
// Bootstrap the actor swarm
if is_host {
ActorSwarm::bootstrap()?
.listen_on("/ip4/0.0.0.0/udp/8020/quic-v1".parse()?)
.await?;
} else {
ActorSwarm::bootstrap()?.dial(
DialOpts::unknown_peer_id()
.address("/ip4/0.0.0.0/udp/8020/quic-v1".parse()?)
.build(),
);
}
let remote_actor_ref = RemoteActorRef::<MyActor>::lookup("my_actor").await?;
match remote_actor_ref {
Some(remote_actor_ref) => {
let count = remote_actor_ref.ask(&Inc { amount: 10 }).send().await?;
println!("Incremented! Count is {count}");
}
...
Its using libp2p under the hood with Kademlia distributed hash table for actor registrations.
actor.register('name') works by using a Kademlia DHT behind the scenes. This is implemented thanks to libp2p which handles all the complications of peer to peer connections
With Kameo, an actor running on another node is simply accessed through a RemoteActorRef, and you can message it the same way you would interact with a local actor. This flexibility allows you to avoid the overhead of schema management while still achieving seamless communication across nodes, making the system more dynamic and less rigid compared to gRPC.
From the abstract: The Actor Model is a message passing concurrency model that was originally proposed by Hewitt et al. in 1973. It is now 43 years later and since then researchers have explored a plethora of variations on this model. This paper presents a history of the Actor Model throughout those years. The goal of this paper is not to provide an exhaustive overview of every actor system in existence but rather to give an overview of some of the exemplar languages and libraries that influenced the design and rationale of other actor systems throughout those years. This paper therefore shows that most actor systems can be roughly classified into four families, namely: Classic Actors, Active Objects, Processes and Communicating Event-Loops.
-> http://soft.vub.ac.be/Publications/2016/vub-soft-tr-16-11.pd...
In kameo, all actors run in a `tokio::spawn` task.
They run in separate tasks / threads anyway and they are cpu-bound. So, why would it be necessary to make them async ?
Kameo does have a `spawn_in_thread` function for CPU bound actors if needed.
Right. It's not a very widespread use case, to be honest. You'd find that most would be N actors for M threads (where N <= M ; an Actor in itself is never shared among multiple threads [So `Send` and not `Sync`, in theory] - an inner message handler _could_ have parallel processing but that's up to the user)
I think you should assume in Kameo that every Actor's message handler is going to be CPU-bound. For example, it means that your internal message dispatch and Actor management should be on a separate loop from the User's `async fn handle`. I don't know if it's already the case, but it's an important consideration for your design.
Nice library, BTW, I think it checks all the marks and I like your design. I've tried most of them but could not find one that I liked and/or that would not have a fatal design flaw (like async_traits, ...) :)
PS : Multi-threaded tokio runtime should be the default. Nobody wants a single-threaded actor runtime. It should be in capital letters in the readme.