Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring artefact archive to production level #590

Open
spbnick opened this issue Oct 23, 2024 · 1 comment
Open

Bring artefact archive to production level #590

spbnick opened this issue Oct 23, 2024 · 1 comment

Comments

@spbnick
Copy link
Collaborator

spbnick commented Oct 23, 2024

At the moment we have a prototype artifact archive working:

  • We post lists of files to (potentially) archive, extracted from the incoming I/O data, into a message queue.
  • We take the lists of files from the queue, download every 256th file under 5MB in size, and upload it into a private GCS (S3) bucket. We use URL hashes as file names.
  • We have a cloud function responding to HTTP requests which specify the artifact URL to fetch. If it finds it in the archive, it responds with a redirect to a "signed URL", which is only valid for a few seconds. The execution restrictions put on this function let us control download rate and thus our costs, indirectly.
  • We have a client class for operations with the archive.

To get to production grade, we need the following:

  • Decide on archival policy - do one iteration of research to decide what kind of artifacts we need to and can store and why. We did do some preliminary research before starting this implementation, which could be useful. Implement corresponding artifact filtering in the code.
  • Decide on retention policy: how long to store artifacts, depending on their properties, and in which storage type, and when to delete them altogether. The target here and above should be user expectations, needs of the triaging system, and costs. Implement that in the code by configuring storage policy on GCS deployment/update.
  • Increase archival load gradually from every 256th file, to 128th, 64th, and so on, until we archive every applicable file.
  • Re-engineer the deployment to handle the load, as necessary.
  • Have Maestro send direct links (not links to folders or WebUI) to uncompressed (or transport-compressed) files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant