You may be familiar with Remote Execution API (REAPI) - Codethink's Santiago Gil wrote an excellent article. But have you heard about RAAPI?
The Remote Asset API is related to the REAPI and exists to enhance solutions that leverage remote execution using this API. The API is split into two parts, Fetch and Push.
In this blog post we will talk through the Remote Asset API, describing its components and going into detail about the server-side implementation which has been worked on here at Codethink.
Before we begin, let's start with a quick definition what we mean by the term "client" in this post. In the context of RAAPI, a client is any piece of software which uses the API calls to request fetching of data or to push data. It will likely be a build client which uses the REAPI, such as Bazel or BuildStream.
Tom Coldrick talks about "bb-remote-asset: A Remote Asset API Server Implementation" at the Build Meetup 2021
What is RAAPI good for?
Before we go into the technical side, we'll quickly look at what this API is for. There are several reasons this API is desirable.
Firstly, it allows a server to download assets, such as source files for builds. This can reduce the network usage for a build by the client. Instead of the client needing to download the blob, determine if it is present in the CAS and upload it to the CAS if necessary. The client can send a request to the server, which will download the file to the CAS if needed. This is also highly beneficial if the client’s own network connection is slow and the server is located somewhere with a faster connection.
It can also act as a cache of sources or even results of remotely executed actions. For example, if a client needs to use a particular git repo, it can take advantage of another client who may have already requested it from the asset server. Again this will result in a significant reduction in time and network usage.
Fetch
A Remote Asset client can make a Fetch request to the server, consisting of a set of URIs and additional metadata. For example, there are fields relating to the age of the data and timeouts, so the client may state that it doesn’t want data that is more than a day old. The client is also given the option to define Qualifiers in the Fetch request, which are arbitrary key-value pairs to give further detail to the server about the nature of the request. The client will expect a response from the server that either provides the Digest of the requested data, which is in the CAS, or an indication of failure. We’ll look at the Qualifiers later, for now, we’ll focus on what the server will do with such a request.
There is no single method that the server must use to generate a response to an incoming request. The only hard requirement imposed by the API is the format of the response.
The server may attempt to download the data that the client has
requested to the CAS using the URIs it has been provided in the request.
A simple HTTP-based fetcher could do this, for example, or some other method
like a git clone
could be used.
Alternatively, it may instead check if the requested data is already cached
from a previous Push
request. As is easy to imagine, some combination
of checking caches and attempting to download data when it is requested seems
to be a sensible choice.
Push
If a client makes a push request to the server, it must provide a set of URIs and optional Qualifiers as it would for a fetch request. However this time, it will give the Digest of the blob to associate this identifying data.
The data that the Digest corresponds to must be in the CAS, and the asset server must store the association of URIs/Qualifiers to Digest in the same way that it would if it had fetched the data.
Qualifiers
So far, we’ve mentioned Qualifiers without explaining what they are or what
their role is in the API. Qualifiers exist to give specific metadata about the
data being fetched and stored to ensure it is what the client needs. They can be
used to specify a commit or branch of a version-controlled repo or provide a
checksum for the desired data to ensure that the correct thing has been
downloaded. Another use is to give the server a hint on how to fetch the data.
For example, if it is a git repo, git clone
may be useful.
Qualifiers can be any Key-Value pair of strings. However, there are a few standard qualifiers mentioned in the API at present (although they are still optional to support for the server): * resource_type: a description of the type of resource. This is where the client may specify that the resource is a git repo for example. The API states that values should be an existing media type as defined by IANA. * checksum.sri: a checksum to verify the fetched data against, as described above. * directory: a relative path of a subdirectory of the resource. It allows the client to get the Digest of only the subdirectory it is interested in. * vcs.branch: a version control branch to checkout before calculating and returning the Digest. * vcs.commit: a version control commit to checkout before calculating and returning the Digest.
Optional Features
Many parts of the API are stated to be optional. This even includes the existence of a Push server.
An optional feature worth mentioning is the server's option to transform assets that it fetches. If a directory is requested from a URL that points to a tarball, the server will unpack the tarball and return the directory's Digest. Or vice versa, if a directory is requested from a URL that points to a blob.
What implementations exist
Currently, there are few implementations of the Remote Asset API, for example bazel-remote and bb-remote-asset - a solution developed by engineers here at Codethink.
The implementation provided by bazel-remote, as the name suggests, is specific
to the needs of Bazel. Currently, Bazel only uses the Fetch side of the API,
specifically only FetchBlob
. Thus this is the only service from the API
implemented in bazel-remote, and it is "very experimental".
As for bb-remote-asset, it is a much more complete implementation of the API, as it is intended to be far more client-agnostic than bazel-remote. We will discuss what bb-remote-asset implements and look at recent additions and potential plans in the next section.
bb-remote-asset
What features does bb-remote-asset implement?
This implementation aims to be versatile and client-agnostic. That is to say that bb-remote-asset can run in different setups depending on what the client may need.
So far, the project implements Pushing to and Fetching from the server. This is done by keeping a record of the assets which have been pushed to it and allowing them to be fetched. It also has limited support for downloading blobs using an HTTP fetcher. The support is limited in the sense that it only works for blobs, directories cannot be fetched in this way currently.
It is also possible to fetch git repositories as directories if the server is configured correctly. This will be covered in a bit more detail in the following section.
Of the five standard qualifiers mentioned previously, bb-remote-asset
supports four to some extent. The only one not currently used at all is
directory
.
Recent additions
Recently there have been a few changes made to the project. Firstly, it
has become possible to cache assets using the action cache of a
Buildbarn remote execution server. This requires conversion from a
representation of an asset to an ActionResult
from the REAPI. The
benefit of this is that the overall amount of storage being used can be
reduced as there is no need for separate storage to be set up.
Hand-in-hand with the previous change, another adjustment allows the use of
remote execution workers to fetch blobs. This is where the previously mentioned
git repositories come in. Remote execution can be leveraged for two values of
the resource_type
qualifier: ' application/octet-stream' and
'application/x-git'. The former is handled by wget
. This can be combined with
some authorisation if required and a checksum to ensure the data's validity. The
latter value of the resource_type
will be handled by a call to git clone
and
can be combined with a branch or commit qualifiers to cause the correct revision
to be checked out before the Digest is returned to the client.
What might be added in future?
One feature that would be nice to add in the future is the unpacking and packing of assets mentioned earlier in this post. This would implement HTTP-based fetching of directories possible, as the archive could be unpacked and the Digest of the directory returned. Currently, an attempt to fetch a directory using HTTP will fail, and a mismatching request, which is to say a fetch blob request which causes a directory to be fetched, is also currently an outright fail.
How to contribute to bb-remote-asset
The Remote Asset API is still in its infancy. As mentioned, there are only two implementations to our knowledge at this time. There is a lot of potential for change and improvement in both the API and the clients and servers that use it.
As for bb-remote-asset, it is currently actively maintained. Contributions are welcome from anybody, be it opening issues for bugs or feature requests or writing pull requests and being involved in the development. Some familiarity with REAPI, Golang and using Bazel will certainly help if you wish to contribute to the code, but they are by no means required, so don't be discouraged from getting involved if you don't have experience with these things. We'd also love to hear from anyone who is trying out bb-remote-asset and would like to encourage more people to give it a go. We are in #buildbarn in the BuildTeam Slack group and are happy to offer support and answer questions there.
Follow our news about Build Engineering
Complete the form and receive in your inbox more information about Build Engineering and Open Source.
Related to the blog post:
- We are hiring: Software Engineers >>
- Introducing the Remote Execution API Testing Project: Testing Bazel's Remote Execution API >>
Other Content
- Codethink/Arm White Paper: Arm STLs at Runtime on Linux
- Speed Up Embedded Software Testing with QEMU
- Open Source Summit Europe (OSSEU) 2024
- Watch: Real-time Scheduling Fault Simulation
- Improving systemd’s integration testing infrastructure (part 2)
- Meet the Team: Laurence Urhegyi
- A new way to develop on Linux - Part II
- Shaping the future of GNOME: GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Meet the Team: Philip Martin
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- Lessons learnt from building a distributed system in Rust
- FOSDEM 2024
- QAnvas and QAD: Streamlining UI Testing for Embedded Systems
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: Exploring a Bug in Stack Unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Achieving Long-Term Maintainability with Open Source
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Full archive