Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming for crate output #205

Open
dnlbauer opened this issue Nov 27, 2024 · 1 comment
Open

Support streaming for crate output #205

dnlbauer opened this issue Nov 27, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dnlbauer
Copy link

The issue

Currently, ro-crate-py only supports writing the output directly to disk when written as zip file with crate.write_zip("exp_crate.zip").

This comes with a limitation when using the library as part of a web service that creates RO-Crates on the fly and serve them as a download. In such a scenario, where files are likely not stored locally, but are stored else where (think of a database, cloud storage, ..), you generally want to stream the files from their remote location directly to the user. If you cant stream them, the need to fit into memory or you have to cache them to your hard drive to stream them from there. Both are not optimal solutions because they require you to either have a lot of memory, or a lot of disk space depending on the size of the dataset.

Zip files in general already support streaming. It would be nice if ro-crate-py would also be able to stream it's output into a stream object like io.BytesIO. This would make it compatible HTTP Streaming.

Example

If you follow the link below, there is a code example where I worked around this issue.

In the given example, the RO-Crates become quite large (TBytes). The generated zip would therefore never fit into the servers memory, or even on the hard drive. As a workaround, I use this library to build a crate object without actually downloading the files. I then create a zip stream where I add the written metadata file. Finally, I add all data files manually to the stream and serve it to the user as StreamingHttpResponse.

https://github.com/dnlbauer/FAIR-workflow-platform-frontend/blob/5900dd331b6beeb7c4ca6e8ba6c1007247b05b16/cwr_frontend/cwr_frontend/views/DatasetDetailView.py#L168

@dnlbauer dnlbauer added the enhancement New feature or request label Nov 27, 2024
@simleo
Copy link
Collaborator

simleo commented Dec 2, 2024

I have no experience on this matter. I think it would be best if you opened a PR with the desired changes and related tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants