One of Codethink's goals is to promote the existence of safe and reproducible software out in the wild. With its numerous safety guarantees and sane package management, Rust is a natural tool to utilise in this endeavour.
We were recently hired by a client to write a 3-node distributed system in Rust, where each node runs a multistage pipeline in parallel and communicates with the other nodes throughout. The project ran to a short and tight schedule, with rolling deadlines by which pre-defined work packages had to be delivered (this was not stressful at all 😅).
The project was successful in the end and here are some of the lessons we learnt in retrospect.
Rust does not slow down development time
A criticism you may often hear levelled against Rust is it takes longer for developers to write when compared to less pedantic languages like Python or C. Let me assure you, this was very much not the case here!
This project began with us building libraries and sub-components of the system. Only in the final third did we integrate these components together into the final services that would be ran in production. Due to the project's tight time schedule the pressure was on for us to develop the sub-components quickly and get them right first time.
Rust's design choice to use explicit error return types rather than exceptions significantly reduces the existence of hidden points of failure. This made it so much easier to reason about how the sub-components could fail and how we wanted to handle this. The ability to propagate errors also let us handle failures all in one place - again easier to reason about a priori. Remember, this is all before we were able to setup proper integration testing.
Of course, we did not get everything right first time - but it was not far off! I dread to think what kind of knots we would have got wrapped in had we tried to follow this approach in an exception-based language.
Navigating async Rust - a pain worth enduring
We used async programming on this project to ensure nodes could perform CPU-bound work whilst concurrently keeping multiple communication channels open with other nodes. Async in Rust is starting to gain a reputation for a being an overly-complicated beast. You can see withoutboats' discussion of this for a nice detailed summary but the pertinent point for our purposes is that the complexity of async is partly a consequence of Rust's unwillingness to sacrifice on performance1.
Admittedly, we too found ourselves stumbling over Rust's async complexity several times. Particularly with regards to shutdown and cancellation safety. We wanted it to be possible to stop and/or restart the services without corruption to data and this required ensuring all tasks, regardless of what thread they were on, listened for and correctly handled certain signals. This is not something the compiler gives you and required careful thought on our part.
From our perspective though, this complexity was worth enduring for the sake of performance. Each node in our distributed system could be performing 1500-2000 concurrent operations, which is at the scale where the runtime overheads from, for example, stackful green threads can come into play. We spent the final quarter of this project squeezing every last drop performance out of our services, so such an overhead may have been intolerable.
Scientific Rust is not quite there yet
One last thing that struck us when working on this is project was the gap in
maturity between the scientific/statistical libraries in C++ and Python compared
to Rust. There is no real equivalent in Rust yet for the advanced algorithms to
be found in scipy
or sympy
on the Python side or the boost
Maths module on
the C++ side. Crates like statrs
, for example, certainly provide
useful functions but they are a long way off feature parity with the afore
mentioned libraries. In our case we noticed this especially for functions
relating to probability distributions.
Now, this is not exactly surprising. The Python libraries had a head start of
about a decade on the release of Rust 1.0 alone - and boost
even more - so
of course there will be differences in maturity. Having said this, a concern of
ours is that progress will not be made without greater uptake of Rust within the
scientific communities.
It will be interesting to watch this space. Scientists could surely benefit from all the usual advantages Rust brings to writing programs with high complexity - but they also have a reputation for wanting their languages to "get out of their way" and just let them do their clever sciencey stuff. Rust, with its opinionated compiler who is desperate to save you from yourself, is certainly not one of those!
Photo credit: Clint Adair on Unsplash
-
and the rest of it is explained by async Rust still being work in progress. ↩
Other Content
- FOSDEM 2025: What to Expect from Codethink
- Codethink Joins Eclipse Foundation/Eclipse SDV Working Group
- Codethink/Arm White Paper: Arm STLs at Runtime on Linux
- Speed Up Embedded Software Testing with QEMU
- Open Source Summit Europe (OSSEU) 2024
- Watch: Real-time Scheduling Fault Simulation
- Improving systemd’s integration testing infrastructure (part 2)
- Meet the Team: Laurence Urhegyi
- A new way to develop on Linux - Part II
- Shaping the future of GNOME: GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Meet the Team: Philip Martin
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- FOSDEM 2024
- QAnvas and QAD: Streamlining UI Testing for Embedded Systems
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: Exploring a Bug in Stack Unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Achieving Long-Term Maintainability with Open Source
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Full archive