You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When using porter archive to save an image to a file to distribute, it always runs gzip on the archive to produce a tar.gz. Running gzip is a fairly CPU demanding thing to do, and it seems in our experience, the archives are only minimally smaller. For one bundle we saw a tgz of 2.30GB extracting to a tar of 2.32GB. While saving that space can be relevant, it may be preferable for gzipping to be optional to speed up porter archive and porter publish commands themselves.
Describe the solution you'd like
Depending on preference, a flag enabling or disabling gzipping:
--gzip <- Default false, enable gzip if relevant
--no-gzip <- Default true, disable gzip if relevant
And then handle .tgz as well as .tar in porter publish --archive
Describe alternatives you've considered
Haven't been able to come up with alternatives, currently we just accept the extra time it takes.
Additional context
Let me know if you need more information, I couldn't come up with anything relevant.
The text was updated successfully, but these errors were encountered:
A thick bundle SHOULD be encoded as a gzipped TAR. This specification is neutral as to what compression ratio is used.
Perhaps a CLI flag allowing configuration of the compression level would be better?
this will then allow the user to select NoCompression
avoid issues when publishing as the archive file is handled (decompressed/unpacked/etc) by the cnabio/cnab-go library (which has some tgz assumptions)
cleaner code in archive.go as the gzipWriter would not need to be handled conditionally
That being said, it will improve the speed of the archive process but it appears that the actual data transfer speed is the (most) limiting factor when archiving a bundle. Below is a few examples of archiving a ~2.3GiB bundle with and without compression:
# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive huge-defaultcomp.tgz --reference <huge bundle ref> --force
real 2m36.773s
user 1m26.772s
sys 0m15.242s
# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive huge-nocomp.tgz --reference <huge bundle ref> --force
real 1m59.890s
user 0m13.060s
sys 0m8.260s
# just tar
$ time ./bin/porter-no-gzip archive huge.tar --reference <huge bundle ref> --force
real 2m0.262s
user 0m11.853s
sys 0m8.895s
# the resulting file sizes
$ du -m huge*
2376 huge-defaultcomp.tgz
2395 huge-nocomp.tgz
2395 huge.tar
A quick test on a bandwidth constrained networks improves the archive time of the same huge bundle from 16m56s to 15m1s 🙀
Similar improvement can be observed when archiving the whalegap bundle:
# gzipped tar with DefaultCompression (default Porter behavior)
$ time ./bin/porter archive whalegap-defaultcomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real 0m20.463s
user 0m12.249s
sys 0m2.100s
# gzipped tar with NoCompression
$ time ./bin/porter-no-comp archive whalegap-nocomp.tgz --reference ghcr.io/getporter/examples/whalegap:v0.2.0 --force
real 0m14.106s
user 0m1.923s
sys 0m0.906s
Is your feature request related to a problem? Please describe.
When using porter archive to save an image to a file to distribute, it always runs gzip on the archive to produce a tar.gz. Running gzip is a fairly CPU demanding thing to do, and it seems in our experience, the archives are only minimally smaller. For one bundle we saw a tgz of 2.30GB extracting to a tar of 2.32GB. While saving that space can be relevant, it may be preferable for gzipping to be optional to speed up porter archive and porter publish commands themselves.
Describe the solution you'd like
Depending on preference, a flag enabling or disabling gzipping:
--gzip <- Default false, enable gzip if relevant
--no-gzip <- Default true, disable gzip if relevant
And then handle .tgz as well as .tar in porter publish --archive
Describe alternatives you've considered
Haven't been able to come up with alternatives, currently we just accept the extra time it takes.
Additional context
Let me know if you need more information, I couldn't come up with anything relevant.
The text was updated successfully, but these errors were encountered: