-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blind Archive support. #20
Comments
For more possibilities for both handles from and to other processes that may have privileged access to archives that the current process lacks see: http://blog.varunajayasiri.com/passing-file-descriptors-between-processes-using-sendmsg-and-recvmsg |
Todo: add blindCopyEntry so an open Handle to another archive can be solicited for information. |
And where are the archive contents till you write them into a file? In memory? I'm not sure what is going on in your example, but the approach seems hackish. |
No hack at all. The contents are in the filesystem. As long as the handle is maintained by at least one thread the data remains in the filesystem. No memory is involved. It is no different than any other file opened anywhere else with the exception that due to the unlink (remove) there exists no directory reference to the file. As soon as the Handle is closed, or the thread/process exits, the file contents are freed by the filesystem. No cleanup necessary. This leaves one free to create an archive on the fly in a blind/anonymous file. The file can be read or written to by any process/thread that has access to the Handle, which includes passing the Handle to other processes on the OS via sockets. There is nothing new or 'hackish' about this idiom. It has been around for decades. There are other applications for Handle passing via OS sockets that need not include unlinking the file from the directory structure. A server that can pass restricted archives to an unprivileged process by making the Handle available via OS socket, no copy of data required. |
Thank you, I'll look into that. |
This is a related and useful technique: |
Haskell has had support for Handle/fd passing via sockets for many years. https://hackage.haskell.org/package/network-2.6.3.1/docs/Network-Socket.html#g:10 |
What follows is a working piece of code that uses createBlindArchive to create an archive from database documents, and then uploads the archive via Yesod. Once the hClose runs, the archive file vanishes from the filesystem. Had exceptions prevented hClose from being reached, as soon as the thread died the archive file and any contents would vanish. data Document = Document { documentName :: FilePath
, cronos :: UTCTime
}
download :: FilePath -> [(Document, ByteString)] -> Handler TypedContent
download archivePath documents = do
h <- liftIO $ do
h <- openFile archivePath ReadWriteMode
removeFile archivePath
hSetBinaryMode h True
createBlindArchive h $ do
setArchiveComment "This archive was created by Me!"
forM_ documents
(\(doc, payload) -> do
es <- mkEntrySelector =<< parseRelFile (documentName doc)
setModTime (cronos doc) es
addEntry Store payload es
)
hSeek h AbsoluteSeek 0
pure h
respondSource "application/zip" $ handleToBuild h
handleToBuild :: Handle -> Source (HandlerT site IO) (Flush DBB.Builder)
handleToBuild h = sourceHandle h =$= lumps
where
lumps = maybeM (liftIO $ hClose h) (\b -> yield (Chunk $ BB.insertByteString b) *> lumps) =<< await
maybeM :: (Applicative m) => m b -> (a -> m b) -> Maybe a -> m b
maybeM _ action (Just a) = action a
maybeM defaultAction _ Nothing = defaultAction |
OK, you can go ahead with PR, but please preserve backward-compatibility in API. |
Absolutely! I already have the code and it passes all of the prior
|
Would you like me to delay the PR until I add a set of tests to the
|
@robertLeeGDM, Let's first see what you've got. |
I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. ( |
I have the code for blind handles, and I have used it in commercial
production for some while without a problem. I had submitted it as a
pull request, but the code was not formatted in accord with the
standards used in that package. I did say I was going to fix it, but
I'm a bit clueless with github, and so if I did reformat it I'd
probably bungle the pull request.
…On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote:
I thought this approach was about equal to the direct conduit
approach of zip-stream, but I am realizing that this blind handle
might solve the problem of simply computing the content length for
populating an http header before streaming the zip.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5
5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi
p","title":"mrkkrp/zip","subtitle":"GitHub
repository","main_image_url":"https://cloud.githubusercontent.com/ass
ets/143418/17495839/a5054eac-5d88-11e6-95fc-
7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent
.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-
b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi
thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa
***@***.*** in #20: I thought this approach was about equal
to the direct conduit approach of zip-stream, but I am realizing that
this blind handle might solve the problem of simply computing the
content length for populating an http header before streaming the
zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr
p/zip/issues/20#issuecomment-360337340"}}}
|
https://github.com/robertLeeGDM/zip <<< See the 'blind' branch.
…On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote:
I thought this approach was about equal to the direct conduit
approach of zip-stream, but I am realizing that this blind handle
might solve the problem of simply computing the content length for
populating an http header before streaming the zip.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5
5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi
p","title":"mrkkrp/zip","subtitle":"GitHub
repository","main_image_url":"https://cloud.githubusercontent.com/ass
ets/143418/17495839/a5054eac-5d88-11e6-95fc-
7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent
.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-
b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi
thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa
***@***.*** in #20: I thought this approach was about equal
to the direct conduit approach of zip-stream, but I am realizing that
this blind handle might solve the problem of simply computing the
content length for populating an http header before streaming the
zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr
p/zip/issues/20#issuecomment-360337340"}}}
|
Memory usage is great in tests. I would emphasize to future users that they need to ensure the filesystem where the handle is created has to have enough space for the largest possible zip file the users expect to produce. In the long run, an approach that doesn't use a filesystem, even blind, is probably more compatible with serving streaming zips from a web application. The drawback here is that users have to wait a long time before the download actually starts for larger zip files. UPDATE:
Update 2: |
We are stuck with the fact that zip was not created with streaming in mind. Zip is it's own worst enemy when it comes to that. |
I wanted to create a Handle independent of the zip module. I believe what I have is working currently. If you want I can create a pull request. If you want to see the code it is in my repo.
I can safely write data to the archive w/o actually exposing it to the filesystem unless I want to. The hPutStr could just as well be to a socket or a conduit to an httpd service, etc.
The text was updated successfully, but these errors were encountered: