Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional gzipping when using porter archive #3083

Closed
Patrick-Clausen opened this issue Apr 15, 2024 · 1 comment · Fixed by #3101
Closed

Optional gzipping when using porter archive #3083

Patrick-Clausen opened this issue Apr 15, 2024 · 1 comment · Fixed by #3101
Labels
suggestion Idea for maintainers to consider. Do not take this issue until triaged.

Comments

@Patrick-Clausen
Copy link

Is your feature request related to a problem? Please describe.
When using porter archive to save an image to a file to distribute, it always runs gzip on the archive to produce a tar.gz. Running gzip is a fairly CPU demanding thing to do, and it seems in our experience, the archives are only minimally smaller. For one bundle we saw a tgz of 2.30GB extracting to a tar of 2.32GB. While saving that space can be relevant, it may be preferable for gzipping to be optional to speed up porter archive and porter publish commands themselves.

Describe the solution you'd like
Depending on preference, a flag enabling or disabling gzipping:
--gzip <- Default false, enable gzip if relevant
--no-gzip <- Default true, disable gzip if relevant

And then handle .tgz as well as .tar in porter publish --archive

Describe alternatives you've considered
Haven't been able to come up with alternatives, currently we just accept the extra time it takes.

Additional context
Let me know if you need more information, I couldn't come up with anything relevant.

@Patrick-Clausen Patrick-Clausen added the suggestion Idea for maintainers to consider. Do not take this issue until triaged. label Apr 15, 2024
@jarnfast
Copy link
Contributor

From the CNAB spec:

A thick bundle SHOULD be encoded as a gzipped TAR. This specification is neutral as to what compression ratio is used.

Perhaps a CLI flag allowing configuration of the compression level would be better?

  • this will then allow the user to select NoCompression
  • avoid issues when publishing as the archive file is handled (decompressed/unpacked/etc) by the cnabio/cnab-go library (which has some tgz assumptions)
  • cleaner code in archive.go as the gzipWriter would not need to be handled conditionally

That being said, it will improve the speed of the archive process but it appears that the actual data transfer speed is the (most) limiting factor when archiving a bundle. Below is a few examples of archiving a ~2.3GiB bundle with and without compression:

# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive huge-defaultcomp.tgz --reference <huge bundle ref> --force
real    2m36.773s
user    1m26.772s
sys     0m15.242s

# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive huge-nocomp.tgz --reference <huge bundle ref> --force
real    1m59.890s
user    0m13.060s
sys     0m8.260s

# just tar
$ time ./bin/porter-no-gzip archive huge.tar --reference <huge bundle ref> --force
real    2m0.262s
user    0m11.853s
sys     0m8.895s

# the resulting file sizes
$ du -m huge*
2376    huge-defaultcomp.tgz
2395    huge-nocomp.tgz
2395    huge.tar

A quick test on a bandwidth constrained networks improves the archive time of the same huge bundle from 16m56s to 15m1s 🙀

Similar improvement can be observed when archiving the whalegap bundle:

# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive whalegap-defaultcomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real    0m20.463s
user    0m12.249s
sys     0m2.100s

# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive whalegap-nocomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real    0m14.106s
user    0m1.923s
sys     0m0.906s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion Idea for maintainers to consider. Do not take this issue until triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants