Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to work with zip file in memory #36

Closed
axman6 opened this issue Jun 8, 2017 · 5 comments
Closed

Ability to work with zip file in memory #36

axman6 opened this issue Jun 8, 2017 · 5 comments

Comments

@axman6
Copy link

axman6 commented Jun 8, 2017

We have an app which downloads many small zip files from the internet and processes them in memory. Currently we're using zip-archive, but I was curious about using a pure Haskell implementation, and wondered if we could use zip. currently it looks impossible without downloading the files to disk, and then processing them (making cleanup much harder). The readme for the project lists this as a negative feature of zip-archive, but IMO it's an incredibly useful one, and stopping us from using zip.

Curious to hear your thoughts on the subject.

@mrkkrp
Copy link
Owner

mrkkrp commented Jun 8, 2017

Currently the idea from #20 seems to be most promising. Here is a relevant PR: #22, but it shows no signs of activity since August 2016.

In memory processing is certainly a very useful feature is some cases, I'll see if I can find the time to implement the ideas from #20 myself and release a new major version.

@mrkkrp
Copy link
Owner

mrkkrp commented Jun 8, 2017

Writing to a given Handler instead of a file seems to be quite doable. Would this be useful to you?

Also, what do you mean by "a pure Haskell implementation"? AFAIK zip-archive is quite pure. Both zip-archive and zip use zlib though (in the case of zip it happens indirectly, via conduit), which is C wrapped by Haskell.

@axman6
Copy link
Author

axman6 commented Jun 9, 2017

Do you mean "Writing to a given Handle"? I want to avoid handles and actual files as much as possible. I don;t understand how the linked issues/PRs address that.

@mrkkrp
Copy link
Owner

mrkkrp commented Jun 9, 2017

Do you mean "Writing to a given Handle"?

Oh yeah, sorry.

@robertLeeGDM proposes to use something like anonymous file where you can write the archive and then seek to its beginning and make a Source out of it as shown here.

Not that I like the approach very much, but the package is file-centered at the moment and that's the only way to avoid dealing with files.

The main motivation for me was to make it possible to work with very large archives efficiently and in constant memory, so it basically tries to stream everything directly to resulting file. The problem with just streaming to a given sink is that we need to write sizes of various blocks which we do by writing dummy zeroes first, then writing content, than calculating size of the content by comparing positions in file before writing the content and after. Then we return and overwrite the zeros with a correct value.

We cannot know the size in advance because we often get contents of particular entry as a Source which we must consume to discover its size.

@axman6
Copy link
Author

axman6 commented Jun 12, 2017

Right, makes sense, though I guess that means this isn't the package for us. Looks good nonetheless, thanks for taking the time to look into this!

@axman6 axman6 closed this as completed Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants