The Docker Official Images project provides and maintains application runtimes packaged in images, such as OpenJDK images, which we will refer to as Docker's OpenJDK build system or DOBS. Due to limitations of Dockerfiles, DOBS relies on Dockerfile templates, bash scripts, as well as jq and awk processing. This serves as a method to conditionally execute instructions, or select between configuration strings. The reliance on ad hoc solutions, such as awk/jq templating, causes maintainability problems. Firstly, developers must learn and be proficient in multiple languages and frameworks to maintain the image build definitions and generation scripts. Secondly, these image build definitions are verbose.
Modus is a language for building OCI container images. Compared to Dockerfiles, Modus makes it easier to define complex, parameterized builds with negligible loss in efficiency w.r.t. build time or image size. Modus provides a cohesive system that replaces the need for Dockerfile templating. Another advantage of Modus is that it encourages developers to explicitly define the ways in which your builds can vary. In contrast, DOBS implicitly define this through their JSON versions file: it is not sufficient on its own to understand which configurations are valid, since one also needs to check the other scripts or template files. For example, one would need to check DOBS' templating script to realize that Oracle-based JRE images are unsupported.
A single 267 line Modusfile holds the conditional logic that defines all the varying image builds. In contrast, the templating approach requires a 332 line template file, a 77 line script to apply the template, and a 140 line file that defines some helper functions using awk and jq.
Below are statistics for (variations of) the linux.Modusfile
according to wc
applied to variations of the files:
Modus Variant | Newlines | Words | Bytes |
---|---|---|---|
Unedited | 234 | 790 | 9922 |
Comments/empty lines removed | 212 | 680 | 9219 |
Comments/empty lines & select tokens removed | 210 | 623 | 7491 |
Below are the combined statistics for (variations of) the files needed for templating, as mentioned above:
DOBS Variant | Newlines | Words | Bytes |
---|---|---|---|
Unedited | 549 | 2209 | 16109 |
Comments/empty lines removed | 441 | 1556 | 10626 |
Comments/empty lines & select tokens removed | 403 | 1326 | 9642 |
Full details on the c5.2xlarge hardware are here.
- We compared performance of DOBS' Dockerfiles and our Modusfile. To provide a baseline for our performance tests, we built DOBS' Dockerfiles sequentially using a shell script
time fdfind Dockerfile$ | rg -v windows | xargs -I % sh -c 'docker build . -f %'
. - We built DOBS' Dockerfiles in parallel using GNU's
parallel
(to replicate Modus' approach of parallel builds)time fdfind Dockerfile$ | rg -v windows | parallel --bar docker build . -f {}
. - We executed Modus using the command
time modus build . 'openjdk(A, B, C)' -f <(cat *.Modusfile)
to build all available images. This builds the same 40 images1 that were built through DOBS.
We used a local Docker registry that caches base images to avoid rate limiting. This leads to a minor speedup, consistent for any of the approaches (i.e. all approaches use these cached base images). All builds were executed with an empty Docker build cache.
The DOBS' Dockerfiles do not take advantage of either multi-stage builds or the caching which would be easier to implement2 with multi-stage builds. Since these are the primary ways Modus improves on performance, we decided to extend the existing OpenJDK approach to implement these optimizations without Modus.
These hand written optimizations actually did not perform better than the naive parallel builds.
This may be due to the non-trivial overhead of copying gigabytes of data which is a necessary step in this builder pattern.
In addition, it is quite possible that most parallel orderings (GNU's parallel
does not default to running all the builds at once) of image builds already avoid the duplicate fetching of binaries that we are
trying to optimize out. So the file copying that we introduce outweighs the benefits of avoiding some network fetches.
This does show that an even more complicated approach would be required for (consistently) better performance, motivating the use of a system like Modus.
Applying the templates to generate the DOBS' Dockerfiles took μ_t = 121.1s, averaged over 10 runs.
Here are the full results averaged over 10 runs for each approach. The final column simply adds 121.1s where appropriate. We've included the exporting time, which is a subset of the total build time using Modus, since this represents an operation performed by Modus that could reduced in future versions.
DOBS | μ (s) | μ + μ_t (s) |
---|---|---|
Sequential | 516.3 | 637.4 |
Parallel | 119.8 | 240.9 |
Manual Optimization | 276.7 | 397.8 |
Modus | μ (s) | μ + μ_t (s) |
---|---|---|
Total | 143.1 | 143.1 |
Exporting | 18.0 | N/A |
Modus performs better overall than DOBS since DOBS' template processing (μ_t) took a significant fraction of the total time to build images.
We used dive which provides an estimate of image efficiency. An example of an 'inefficiency' would be moving files across layers - this is a change that needs to be recorded as part of the layer, yet could be avoided by rewriting the Dockerfile.
Approach | Average Efficiency |
---|---|
Built with our Modusfiles | 98.9 |
DOBS' Images | 98.8 |
DOBS' images score highly on image efficiency (all above 95%), but at a cost to readability and separability.
Nearly half of their Dockerfile is a single RUN
layer, to avoid the issue of modifications recorded in the layer diffs bloating the image size.
Modus provides a merge
operator to solve this issue, which helped us achieve high image efficiency scores. merge
is an operator that will merge the underlying commands of an expression into one RUN
layer.
In this case, if we remove the merge
, the image efficiency drops to about 75%. One operation that contributes to the inefficiency is updating cacerts
in a separate RUN
layer, and there may be other similar operations performed within the body of this merge
that create a new layer with avoidable diffs.
This demonstrates that merge
facilitates the best of both worlds: the readability of separating out sections of code without the inefficiency of more layers recording more diffs.
The variables exposed to a user are (a subset of the parameters that can vary for a build):
- Major application version
- Java Type
- Variant
So a user may request a goal of openjdk(A, "jdk", "alpine3.15")
to build all versions of JDK on Alpine.
Below is a complete list that shows the ways in which our OpenJDK configuration can vary, heavily inspired by the DOBS' approach:
- Major application version
- Full version
- Java Type (JDK vs JRE)
- Base image variants (e.g. bullseye, buster, alpine3.15)
- AMD64 Binary URL
- ARM64 Binary URL
- Source
For reference, this diagrams DOBS' build steps at the time of writing.