One of the greatest challenges in maintaining a stack of software (such as an SDK or a Linux Distribution) is in ensuring that users can update this base without breaking their own tooling. freedesktop-sdk is the base runtime used for applications distributed via flatpak, which means that any application from flathub needs to be able to trust that freedesktop-sdk is stable enough that an update to the runtime won't cause unexpected breakages.
The responsibility for stability in this context is not unlike the great maxim of kernel developers not to break userspace - when the job is done right, the end user shouldn't need to worry about the runtime at all.
One of the chief causes of unexpected breakages are Application Binary Interface (ABI) breaks.
What is ABI?
Application Binary Interface (or, more snappily, ABI) is a similar idea to its more famous cousin, API (or Application Programming Interface, to use its Sunday name.) API is a mechanism by which a programmer can have one program talk to another, allowing the reuse of other people's code. For example, one could write code to implement TLS oneself, but it would be much easier to simply use one of the existing TLS libraries via their public API.
If API is the way a programmer can call one program from another, then ABI is how the computer can call one program from another. This covers a whole lot of intricacies such as calling conventions and the object format. Much of this is beyond the scope of this article to talk about in depth, and largely irrelevant to the use case we have at hand. Let's consider that use case now.
freedesktop-sdk provides many, many dynamically linked binary libraries, which are depended upon by applications distributed via flatpak. During a stable release of freedesktop-sdk, we need to be careful to make sure that these binary libraries provide a consistent interface for application binaries to communicate with. If this interface should change during a runtime update, an application built against the old version may miscommunicate and crash!
What happens if ABI is broken?
Let's see what actually happens if we break ABI. For this example, I'll use functions to demonstrate the type of things that can go wrong, but bear in mind that ABI is a much larger surface than just the signatures of the functions. Let's write a simple C library to investigate what can go wrong:
int square (int x)
{
return x * x;
}
int cube (int x)
{
return x * x * x;
}
Here we define a pair of functions - one to square an integer, and the other to cube an integer. We can compile this into a dynamically linked library by running:
gcc -o example.o -fpic -c example.c
gcc -shared -o libexample.so example.o
# Let's put this in the /lib1 directory
mdkir -p /lib1
mv libexample.so /lib1
This gives us a library like those shipped in the freedesktop-sdk SDK. Now let's
write an "application" that uses this library to offload the heavy lifting of
squaring or cubing a number. Here we have a header, example.h
:
#ifndef EXAMPLE_H
#define EXAMPLE_H
extern int square (int);
extern int cube (int);
#endif
And a main file, main.c
:
#include <stdio.h>
#include "example.h"
int main () {
printf("%d\n", square(3));
printf("%d\n", cube(2));
return 0;
}
We can compile this to an executable by running gcc
as follows:
$ gcc -o main -lexample -L/lib1 main.c
$ # Let's run it too
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
9
8
So, we now have an application and a library working together. What could go wrong?
To investigate, let's do a couple of changes to the library. We'll make these simple changes to the API as these will give dramatic effects, but bear in mind that ABI changes may be much more subtle, especially when a library has some symbols which don't get exposed to public API.
We'll make two changes, first let's remove both of these functions, and replace them with a generic power function:
int power (int x, unsigned int n)
{
if (n == 0) {
return 1;
}
return x * power(x, n - 1);
}
If we recompile the library but not the application, then run the application, we will discover that we can't:
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
./main: symbol lookup error: ./main: undefined symbol: square
This is an ABI break! When our library was recompiled we removed some symbols that we depended on, and so now we cannot even run the application! This sort of ABI break is breaking backwards compatibility - updating the library means that we can no longer run applications built against an old version. For the most part, people only care about backwards compatibility breaks, but forwards compatibility breaks, where new symbols are added, may also be a problem when an application is distributed without updating the runtime.
Modifying a symbol can cause more subtle issues. Instead of replacing square
with power
, what if we modify it to take a different type?
double square (double x)
{
return x * x;
}
If we recompile and run, this time we get:
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
0
8
Uh oh, that's giving us 0
instead of 9
! It's not hard to see how this could
cause a multitude of problems for an application. This sort of change can be
particularly hard to spot due to custom types changing.
How does freedesktop-sdk mitigate this?
Now we've seen the kinds of catastrophic failure breaking ABI can cause, let's take a look at how freedesktop-sdk mitigates the risk of such dreadful things happening.
freedesktop-sdk incorporates hundreds of components, all of which need regularly updating with the latest bug fixes and security patches. Clearly, checking all of these updates for ABI stability manually would be too much work for a team even larger than freedesktop-sdk has access to. To avoid this, they have an automatic ABI checker that compares the binaries of two revisions, developed by contributor Mathieu Bridon (bochecha).
The ABI checker leverages an open source project called libabigail to do the heavy lifting, which can be used to output a summary of the diff between the ABI of two binaries. This is wrapped in a simple python script, which reports whether there are any differences that require attention.
In order to minimise human involvement, the update process is almost entirely
automated. BuildStream's track
feature is used to
check for new git tags in the upstream repositories, and a bot automatically
creates branches and merge requests for any updates. In the GitLab CI pipeline
for freedesktop-sdk, the ABI checker (and several other checks) are run on the
updated version, which tells the team whether it's safe to update. If a break
does happen, then a human can inspect the change in more detail, and make an
informed decision on what to do.
Of course, this isn't a complete solution and there are issues. Firstly, some libraries in the SDK cannot be checked. For example, the LLVM compiler toolkit can't be checked as doing so causes the GitLab runners to run out of memory! As such, some libraries must be skipped by adding them to a configuration file The checker also doesn't currently cover interpreted languages, which don't have an ABI to speak of - for example Python API breaks cannot be detected.
All in all, this tooling allows a small team to keep hundreds of components up to date in two stable releases, without fearing for unexpected ABI breakages while updating.
Related to the blog post:
Other Content
- Speed Up Embedded Software Testing with QEMU
- Open Source Summit Europe (OSSEU) 2024
- Watch: Real-time Scheduling Fault Simulation
- Improving systemd’s integration testing infrastructure (part 2)
- Meet the Team: Laurence Urhegyi
- A new way to develop on Linux - Part II
- Shaping the future of GNOME: GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Meet the Team: Philip Martin
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- Lessons learnt from building a distributed system in Rust
- FOSDEM 2024
- QAnvas and QAD: Streamlining UI Testing for Embedded Systems
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: Exploring a Bug in Stack Unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Achieving Long-Term Maintainability with Open Source
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Full archive