Skip to content

Commit

Permalink
add some documentation about using sccache with this image
Browse files Browse the repository at this point in the history
  • Loading branch information
cbeck88 committed Jun 7, 2024
1 parent ec47470 commit 11b0805
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ ldd target/x86_64-unknown-linux-musl/release/EXECUTABLE
- [Kubernetes reflector with axum using builder pattern](https://github.com/kube-rs/version-rs/blob/main/Dockerfile)
- [Kubernetes controller using cargo-chef for caching layers](https://github.com/qualified/ephemeron/blob/main/k8s/controller/Dockerfile)
- [Github release assets uploaded via github actions](https://github.com/kube-rs/kopium/blob/f554ad9780dec3c76b4cef8a16a02bc82dded2be/.github/workflows/release.yml)
- [Using muslrust with sccache & github actions](./SCCACHE.md)

The binaries and images for small apps generally end up `<10MB` compressed or `~20MB` uncompressed without stripping.

Expand Down
83 changes: 83 additions & 0 deletions SCCACHE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
## muslrust + sccache

The `muslrust` image includes `sccache`, so you can use it easily to try to improve build times.

To use it, set `RUSTC_WRAPPER` to `path/to/sccache`, and set some environment variables to configure it.

* `SCCACHE_DIR` is the directory that sccache will use to cache build artifacts
* `SCCACHE_CACHE_SIZE` indicates the maximum size of the cache. `SCCACHE` will evict items when the limit is exceeded.
* `SCCACHE_ERROR_LOG` is a path to a text file, which you can inspect if there are errors.
* `CARGO_INCREMENTAL` should be set to `0` whenever using `sccache`. (modern versions of `sccache` may set this to 0 themselves, I'm not sure tbh.)

`sccache --show-stats` can be used to print stats for cache hits, misses etc. There is also an command to zero the stats,
but it is usually unnecessary to do so in the context of this image, because `sccache` does not persist the stats to disk,
and the process terminates when your build completes.

Here's an example `docker run` command:

```
if [ -z $MOUNT_ROOT ]; then
MOUNT_ROOT="$HOME/.muslrust"
fi
POST_BUILD_CMD=chown -R $(id -u) ./target /root/.cargo/registry /root/sccache
docker run -v $PWD:/volume \
-v "$MOUNT_ROOT/cargo/registry":/root/.cargo/registry \
-v "$MOUNT_ROOT/sccache":/root/sccache \
--env CARGO_INCREMENTAL=0 \
--env RUSTC_WRAPPER=/usr/local/bin/sccache \
--env SCCACHE_DIR=/root/sccache \
--env SCCACHE_CACHE_SIZE="${SCCACHE_CACHE_SIZE:-5G}" \
--env SCCACHE_ERROR_LOG=/tmp/sccache.log \
--rm -t clux/muslrust:stable sh -c "AR=ar cargo build --release --locked && /usr/local/bin/sccache --show-stats && ${POST_BUILD_CMD}"
```

### Mounting and caching direcotries

In the above example, we're mounting `/root/.cargo/registry` and `/root/sccache` to the host machine, because these are directories we want to cache across invocations.

You could use docker named volumes instead of actually mounting them to the host filesystem if you like, but if you plan to use this in github actions,
it's better to actually mount them, and then cache those directories in github actions.

Note that if you are running this locally, neither of these directories is going to grow without bound, because `cargo` has a gc internally for registry stuff, and `sccache` evicts cached files
in an LRU fashion when the cache exceeds `SCCACHE_CACHE_SIZE`. So storing this in a home directory is a reasonably safe default.

Caching this correctly in github actions is pretty simple.

For "normal" rust builds (invoking cargo from gha directly, not using something like muslrust image or sccache), it's highly recommendable to use
something like the [`rust-cache` action](https://github.com/Swatinem/rust-cache) and not re-invent the wheel, beause that is going to do things like, try to cache all builds of dependencies intelligently,
check the toolchain and make that part of the cache key, etc. etc.

When using `muslrust` with `sccache`, `sccache` is essentially going to do all that work. The `SCCACHE_DIR` is safe to share across OS's, architectures, toolchains, etc, because all of that data goes
into the hash keys computed by `sccache`.
The `.cargo/registry` is also not dependent on your toolchain or OS or anything like that. Also `rust-cache` will attempt to figure out your `cargo` and `rustc` versions by interrogating whatever is in the path,
but that won't actually pick up the stuff in the `muslrust` image. So `rust-cache` is not the right choice here, and we can and should just use something very simple like

```
- name: Cache muslrust cargo registry and sccache dir
# https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
uses: actions/cache@v3
with:
path: /tmp/muslrust
key: v1-sccache
restore-keys: v1-sccache
```

and set `MOUNT_ROOT` to `/tmp/muslrust` in CI. (`$HOME` may or may not work correctly in gha).

The only reason to get fancier with the gha cache keys here is if you have lots of jobs using this and for some reason you don't expect them to be able to share artifacts for some reason.
For example, if you are using `muslrust:stable` and `muslrust:nightly`, probably nothing at all can be shared between these builds so you might as well use separate github cache keys for those as well.

Note that per docu, github has a repository limit of 10G in total for all caches created this way. I suggest using 5G as the `SCCACHE_CACHE_SIZE` and leaving some G's for the `.cargo/registry`, but ymmv.

### Post-build command

As described in the main [`README.md`](./README.md), on linux the build is going to run as root in the `muslrust` image and so any files it produces will be owned by root, if they are mounted into the container.
For several reasons that can become annoying, and a quick `chown` fixes it.

Here we're adding a `POST_BUILD_COMMAND` that changes ownership not only for the `target` directory, but also the cargo registry and sccache directories. This is because the github `actions/cache` action will fail
to save and restore files owned by root.

On a mac, docker works differently, so if you are using the example command there, the files won't actually be owned by root, and also the `chown` command will be very slow. So on mac it is better to either skip
the `POST_BUILD_CMD`, or you could modify it so that it actually tests if we are root before doing the `chown`.

0 comments on commit 11b0805

Please sign in to comment.