Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 0-compressed ZIP files #12

Open
nh2 opened this issue Oct 23, 2021 · 3 comments
Open

Support 0-compressed ZIP files #12

nh2 opened this issue Oct 23, 2021 · 3 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Oct 23, 2021

I'd like to compress ZIP files without compression, since I need to provide the format, but the ZIP comprssion algorithm even on lowest level 1 only does ~25 MB/s on modern CPUs, and uncompressed is incredibly much faster.

The docs say:

It does not (ironically) support uncompressed zip files that have been created as streams, where file sizes are not known beforehand.

I don't quite understand what exactly that means; when zipping with this library, the file sizes are known to it beforehand, aren't they?

See also potentially related (?) #4.

I think it would make sense to have this issue to track this feature of the library being able to decompress its own files.

@dylex
Copy link
Owner

dylex commented Oct 26, 2021

If you just want to create zip files without compression, setting compress level 0 (stored) should work fine. Similarly, uncompressing most level 0 zip files with unzip will work okay. The issue is only when uncompressing stored+streamed zip files, since such files have no way of knowing the size of the data without reading the footer TOC at the end of the file. It's really just a zip file format limitation.

@nh2
Copy link
Contributor Author

nh2 commented Oct 26, 2021

@dylex So are you saying, for zip-stream to support uncompressing level 0 streamed zip files, the Conduit would have to either buffer the entire input in RAM, or have to know that it's a stored file (e.g. file path on disk) with random-access read to its end?

@dylex
Copy link
Owner

dylex commented Oct 26, 2021

Right, it would have to read the end of the file before being able to extract anything. Since that is antithetical to streaming, and there are many other good libraries for accessing zip files on disk, it doesn't seem worth adding a separate interface to allow it. I would be open to improving the handling of this situation, though (to produce a better error where you can fall back to buffering the whole thing somehow and using a different solution). Right now it just fails the conduit which is not great.

If you wanted to create a zip, even by streaming, where the size and crc-32 of all files is known ahead of time, you could do it in a way that would be supported for unzipping. The zip side of the library doesn't currently support this (because there's no optional crc32 field), but that would be fairly easy to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants