Improving systemd’s integration testing infrastructure (part 2)

This is the second blog post in a two-part series. Click here to read part one.

In Part 1, we discussed the shortcomings of the previous integration test suite. It was clear there was room for improvement, but where to start?

We decided to make better use of off-the-shelf and reusable components that were available to use. As a result, this meant there wasn't a large amount of single-purpose code to maintain.

In Part 2, we examine the architecture of the new integration test suite and summarise the test results.

Introducing vmspawn

systemd-vmspawn may be used to start a virtual machine from an OS image. In many ways, it is similar to systemd-nspawn(1), but it launches a full virtual machine instead of using namespaces.

-- The systemd-vmspawn(1) manpage

For the reasons explained in part 1, it's not ideal for the VM driver code to be a bash script, but the integration tests need some abstraction over qemu because common operations don't need to expose the complexity of the qemu command-line.

Since this code has to exist somewhere, it's useful for other purposes if it is cleanly separated into another tool.

Since it's required by virtue of the fact that we are testing systemd that we have rich integration¹ with the guest systemd, and to a lesser extent the host systemd, it's advantageous for this tool to be developed in-sync with systemd.

A qemu wrapper that supports rich integration with systemd does already exist in the form of mkosi qemu, but it's not practical to use it with images that mkosi has not built itself, and this adds a python and other distro packaging tool dependencies.

libvirt is an existing qemu wrapper but it's intended to support multiple guest operating systems and hypervisors, which is overkill for our needs and adds many additional dependencies we wouldn't otherwise need.

For these reasons, it's appealing to have another optional systemd component written in C, that doesn't have to compromise on features to support OS images that aren't running a recent version of systemd, and its interface closely matching that of systemd-nspawn allows for an easy transition between them, and so it offers a useful value proposition and isn't Yet Another Qemu Wrapper.

	mkosi qemu	libvirt	systemd-vmspawn
qemu wrapper	✅	✅	✅
rich integration¹	✅	❎	✅
prebuilt image usable	❎	✅	✅
no extra deps²	❎³	❎	✅
nspawn-like interface	❎	❎	✅

The architecture of the new integration test suite

Building images

We previously discussed mkosi because it supported running VMs with rich integration with the guest systemd.

mkosi is a tool for building operating system images, both in the form of disk images and chroot directories, for a wide range of Linux distributions using pre-existing packages from their repositories, and allowing projects to compile against the ABI of the target distribution to build components without being in the distro package repository.

It uses systemd-repart and user namespaces to avoid having to use root to build the disk images.

Handily, the systemd project already used it for smoke testing systemd-boot and systemd-stub, and allowing developers to easily build and test changes in a VM so they didn't have to risk damage to their development system.

One downside to the original test suite's approach to building images was that it needed one image per test and often needed per-test customisations.

For the new integration test suite, we preferred to have as few images as possible, ideally just one, so that we can create snapshots to run the tests.

This required extending the configuration in mkosi.images/system to add additional packages for tools the integration tests use and, in some cases, changing the kernel package for one that includes more modules, and adding additional test configuration and data to the test image.

Defining and scheduling tests

Instead of having a shell script that runs make -C ... (which runs test.sh) we define the integration tests as a new part of the meson test suite, so the TEST-??-* directories may contain a meson.build file that defines per-test options, such as test/TEST-21-DFUZZER/meson.build.

# SPDX-License-Identifier: LGPL-2.1-or-later

integration_tests += [
        integration_test_template + {
                'name' : fs.name(meson.current_source_dir()),
                'timeout' : 3600,
                'priority' : 50,
                'slow' : true,
        },
]

These options are used to construct a call to the meson test() function that defines a new test that can be run with meson test.

These tests are skipped by default because they are slow but can be opted into by enabling the integration-tests meson option by running meson configure -Dintegration-tests=true build , and the tests can be run with meson test -C build --suite integration-tests.

Because changing meson config just to deselect tests also causes meson to have to re-run its setup it's possible to force those tests to be skipped by setting SYSTEMD_INTEGRATION_TESTS=0 so the command becomes env SYSTEMD_INTEGRATION_TESTS=0 meson test -C build.

The tests require that the test image has been built first. This can be accomplished by running meson compile -C build mkosi && meson test -C build --suite integration-tests.

How tests run

The TEST-??-*/meson.build files define per-test customisations.

For example, TEST-06-SELINUX/meson.build enables SELinux and the units for relabelling files on first-boot by instructing meson to pass additional arguments for mkosi to the integration-test-wrapper.py script.

# SPDX-License-Identifier: LGPL-2.1-or-later

integration_tests += [
        integration_test_template + {
                'name' : fs.name(meson.current_source_dir()),
                'cmdline' : integration_test_template['cmdline'] + ['selinux=1', 'lsm=selinux'],
                # FIXME; Figure out why reboot sometimes hangs with 'linux' firmware.
                'firmware' : 'uefi',
        },
]

testdata_subdirs += [meson.current_source_dir() / 'TEST-06-SELINUX.units']

The cmdline option gets turned into a mkosi --kernel-command-line-extra argument that adds the provided arguments to the kernel command-line.

In addition to supporting direct kernel boot with a separate kernel and rootfs disk image, mkosi also supports separate kernel and virtiofs rootfs and booting a GPT partitioned disk image with its own bootloader.

When not using a separate kernel the command-line parameters are not passed with the -append option, they are instead passed as SMBIOS type 11 vendor parameters named "io.systemd.stub.kernel-cmdline-extra" when booting an EFI type 2 entry (e.g. a UKI (a kernel and initramfs wrapped in [systemd-stub][])) or "io.systemd.boot.kernel-cmdline-extra" when booting an EFI type 1 entry (i.e. a config file pointing to kernel and initramfs inside the boot partition).

The tests can be run with a separate kernel, but more code is covered when they are run with a GPT disk image. This can be accomplished by adding the following config to mkosi.local.conf:

[Output]
Format=disk

The difference in how parameters are passed is represented graphically as follows:

It's possible to run mkosi qemu directly from meson's test functions, but meson.build files are deliberately not a fully featured programming language, and juggling all of the command-line options for running mkosi qemu is significantly more maintainable with less redundancy in another language, so the tests start test/integration-test-wrapper.py with appropriate arguments.

This handles high-level test behaviours like:

Specify which test to run
Forward logs to the host and also conditionally store them in the filesystem
Configure systemd to report test success by sending exit status 123 over the notify vsocket
Shut down on boot or test failure when not running in interactive mode and use exit status 0 to indicate boot failure

These behaviours are then translated into mkosi --credential and --kernel-command-line-extra options which as previously mentioned mkosi converts into -append or -smbios qemu arguments depending on whether sd-boot is used.

The sequence of operations for running a test is as follows:

Result

UEFI smoke tests

As of this commit systemd-boot and systemd-stub no longer just have a smoke test to determine whether it fails to boot.

Instead any integration test can opt into that, by configuring 'firmware' : 'uefi' in their meson.build.

Tests complete

The work was initially handed over as Pull Request 30234 but it did not apply due to churn so we worked together with Daan to resolve the issues.

It was required that:

Every test reuse the same test image.
The steps for building images matched other changes in how images were built.
The complicated test setup was handled a different way.

This PR included an implementation a --hook-module option that loaded a python file and allowed functions defined in there to customise test runner behaviour.

The complex virtual device setup required for TEST-64-UDEV-STORAGE was instead implemented by passing the tests configure scripts that alter the QemuArgs config, and TEST-69-SHUTDOWN instead of booting the VM inside of a pexpect script that sent commands to the VM console runs inside the VM, spawning a separate login console and sending commands to that.

Coverage and Address Sanitizer builds were de-scoped in favour of getting more tests running, but sanitizers were already mostly supported through meson.

Platform coverage

The bash-based integration tests could theoretically run on any system that had the requisite files.

In practice, per-distribution fixups were required, and the only supported distributions were Arch, CentOS, Debian, Fedora, Suse and Ubuntu.

The new mkosi-based integration tests build on top of the platforms supported for smoke testing and support every distribution release in the CI test matrix, rather than just running in CentOS CI and autopkgtest.

Mkosi's supported distributions also includes RHEL, Open Mandriva, Rocky and Alma Linux, though adding support for these also requires configuration of which packages are required to run tests and integration of their packaging scripts as submodules.

Performance

In CI it takes approximately 40 minutes per distribution to run all of the tests.

7 distributions listing their runtimes

Third-party bug fixes

When developing test suites it is not uncommon to find bugs with the code you are testing, rather than the test suite itself.

We did not maintain an exhaustive list of these cases but since we are building operating system images from third party packages it wasn't unusual to find bugs in other people's software.

Distribution packaging

Building operating system images with packages is more convenient than from host system files since the distribution maintains dependency lists.

These can sometimes have missing dependencies though, especially if they're commonly installed packages such as gdb-headless missing which on Fedora.

Meson test console

A reoccuring puzzle when running qemu integration tests was that after a test had failed, it was impossible to press Ctrl-C and cancel the remaining tests.

This was because Meson 1.4.0 and earlier forward your standard input to the test and qemu takes control over your console to forward Ctrl-C to the VM.

Unfortunately, because Meson is capturing output, you can't usefully interact with the VM since you only see the output after the VM exits.

Daan De Meyer has added--interactive as an option to make tests run sequentially with input and output connected to the terminal. This allows failures to be usefully debugged and ensures input is not passed through otherwise.

This work has now been released in Meson 1.5.0. If you are using an older version of Meson, we recommend running tests as meson test --suite integration-tests </dev/null so that cancelling the test suite is possible.

Other systemd contributions

For a full list of Codethink's pull requests, click here.

We often noticed things while working and had a good idea about how to fix rather than being specifically related to the goal. This includes:

A machine_new invalid edge case
ptyfwd breaking for Kitty, while working on vmspawn
XML fixes in manpages

Some other favourites were discovering race conditions such as:

Many ways to boot VMs

Booting a GPT formatted disk image with UEFI was our main focus for testing because it has the most moving parts where things can go wrong, but we can run tests with direct kernel boot or UEFI with 'firmware' : 'linux' or 'firmware' : 'uefi' respectively in the integration test configuration.

Creating a disk image can also be skipped entirely when building the test VM since mkosi supports booting using virtiofs for the root file system.

This allows the test suite to use the fastest applicable method of running each integration test or booting a VM for manual testing.

The first mkosi native integration test

The network tests were previously run in CentOS CI separately from the integration tests due to them being written in python and the bash test image builder not supporting that easily.

Because it's significantly easier to add packages in mkosi, they have been added to a new mkosi-only integration test TEST-85-NETWORK.

Summary

We worked closely with Daan to get our test changes merged resulting in the addition of a new TEST-85-NETWORK so we had 64 working integration tests instead of 63.

Sanitizers are enabled in GitHub CI for builds on Fedora but Coverage is currently still using the old integration test suite running in CentOS CI.

Mkosi is a lot easier to set up and get running than the old test suite and makes it easy to test on distributions you don't usually run.

This is a great improvement in testing accessibility and should aid in catching all sorts of regressions and corner cases long before they affect users in the wild.

It has been a privilege for Codethink to be part of this work. We'd like to thank the Sovereign Tech Fund for supporting the project.

By this we mean providing credentials, kernel command-line options related to systemd behaviour, forwarding the journal out of the VM and using the notify protocol over vsock to use the target unit's exit status as the VM's exit status. ↩↩
This assumes we already have systemd and qemu, and virtiofsd if we want directory mounts. ↩
For integration tests we are already using mkosi for building images which means we can use it, but other users may download or build images using other tools. ↩