Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3 Object Storage for Package Repositories #291

Merged
merged 16 commits into from
Aug 28, 2024

Conversation

mmlr
Copy link
Member

@mmlr mmlr commented Aug 28, 2024

  • Adds object storage based backing for the package repository (S3 API only for now)
  • Reduces the number of volumes that are used to ease deployment
  • Fixes constant reuploads of system-packages to builders
  • Other minor fixes and cleanups

After deployment this should fix #258.

mmlr added 16 commits August 27, 2024 01:51
Packages can come from either packages or system-packages directories
and both need to be checked to see if the cached package should be
kept.

Packages coming from the system-packages directory were not taken into
account and pruned from the builder cache on each buildrun only to then
be re-uploaded.
As the comment indiciated, earlier versions of Paramiko did not provide
a rename/move operation that was compatible with BFS due to the use of
hardlinks. Newer versions do now provide posix_rename that can be used
instead of the manual "mv" shell command.
Since building the host tools as part of the container image build, the
source volume is not actually needed for the backend and the frontend
never needed that volume in the first place.

Originally the shared source volume was meant to reduce used disk space
when running multiple instances. This is not needed anymore as the image
contains and shares the host tools and the bootstrap process for getting
the system-packages has been externalized.

The only user of the shared sources was the built in licenses in the
Haiku repository. For now, provide these in the image as well. This
could later also be moved to an external archive like what is done for
the system-packages.
Adding the HaikuPorter sources to the image invalidates the cache for
each change that is made. Move that install to the end and into separate
steps so that package installation and minisign build can be cached.
This makes this work more out of the box.
This is only printed when system-packages are missing.
The echo command, introduced to make the output easier to read, was
hiding the return value of the actual package repository creation
command.
This furthers abstraction and will be needed when packages are not
necessarily local anymore.

Read and write are implemented as streaming operations using file
objects to allow for various backends without the need for local
temporary copies of files.
That's what the member variable is called and what that list actually
contains.
These are never used as the obsoletion is handled at the Repository and
PackageRepository level.
The storage backend is used to hold the actual packages while the local
packages directory is only used to keep track of the current package
list.

New packages are spooled to the local packages directory as they are
built and are kept there for adding them to the package repo file
(where package information is needed and the checksum is calculated).
Once added to the repo, the packages are uploaded to object storage and
the local copy is stubbed out to an 0 byte file.

When dependency packages are needed on the builder (and are not already
cached there), they are streamed directly from object storage without
repopulating the local packages directory.

After the package repo is updated it is uploaded to object storage as
well, along with its info file, sha256 checksum and the package list
file. This allows the object storage to be used as a complete package
repo by pkgman directly.

Finally packages in object storage are then pruned based on the list of
current local stub package files to keep the state in sync.

Note that this requires a "package_repo" command that supports the "-t"
argument to the "update" command as only stub packages are available
locally and the package info can therefore not be extracted from them.
Instead the package names are assumed to be canonical and the package
info to be immutable. This is unproblematic, as the buildmaster setup
ensures that packages cannot be overwritten (this would also have failed
previously as the checksums were intentionally not revalidated).

The storage backend config file path is given with a new
"--storage-backend-config" option. It should point to a JSON file with
a "backend_type" string (only "s3" is supported for now). A sample
config is also included. An empty path is allowed and causes no storage
backend to be used.

The S3 storage backend needs an "endpoint_url", "access_key_id",
"secret_access_key" and "bucket_name" to be specified in the config
file. An optional "prefix" can also be supplied to place multiple
instances into the same bucket.

Include the storage backend config option in the buildmaster scripts fed
from a "STORAGE_BACKEND_CONFIG" environment variable for easy
configuration.
The packages repository never actually needed to be shared or separate
and can just as well be located on the main buildmaster volume. It was
originally shared only so that repositories for multiple architectures
could be served from a single server.

When using object storage as the storage backend, the repository
directories are only used to keep the state and don't provide the
actual repo or package files. In this case a separate volume is even
less useful.

Point frontend container to the single buildmaster volume instead of the
previously shared instances directory on the packages volume. This means
that the fontend will generally not be shared across architectures
anymore. Since it reduces the scope of the shared volumes this does ease
deployment.

The "repo_consistency.txt" and "report.txt", that report the consistency
of the recipe and package repository respectively, are moved from the
packages volume to the output directory as this makes them accessible
through the normal frontend.
Copy link
Member

@kallisti5 kallisti5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea on the licenses from the build container! They were always a pain to groom.

This patchset is pretty amazing! It's going to solve a lot of maintenance issues we have had over the years. NICE WORK!

@kallisti5 kallisti5 merged commit 1f7e28e into haikuports:master Aug 28, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

buildmaster needs to leverage object storage
2 participants