You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, ro-crate-py only supports writing the output directly to disk when written as zip file with crate.write_zip("exp_crate.zip").
This comes with a limitation when using the library as part of a web service that creates RO-Crates on the fly and serve them as a download. In such a scenario, where files are likely not stored locally, but are stored else where (think of a database, cloud storage, ..), you generally want to stream the files from their remote location directly to the user. If you cant stream them, the need to fit into memory or you have to cache them to your hard drive to stream them from there. Both are not optimal solutions because they require you to either have a lot of memory, or a lot of disk space depending on the size of the dataset.
Zip files in general already support streaming. It would be nice if ro-crate-py would also be able to stream it's output into a stream object like io.BytesIO. This would make it compatible HTTP Streaming.
Example
If you follow the link below, there is a code example where I worked around this issue.
In the given example, the RO-Crates become quite large (TBytes). The generated zip would therefore never fit into the servers memory, or even on the hard drive. As a workaround, I use this library to build a crate object without actually downloading the files. I then create a zip stream where I add the written metadata file. Finally, I add all data files manually to the stream and serve it to the user as StreamingHttpResponse.
The issue
Currently, ro-crate-py only supports writing the output directly to disk when written as zip file with
crate.write_zip("exp_crate.zip")
.This comes with a limitation when using the library as part of a web service that creates RO-Crates on the fly and serve them as a download. In such a scenario, where files are likely not stored locally, but are stored else where (think of a database, cloud storage, ..), you generally want to stream the files from their remote location directly to the user. If you cant stream them, the need to fit into memory or you have to cache them to your hard drive to stream them from there. Both are not optimal solutions because they require you to either have a lot of memory, or a lot of disk space depending on the size of the dataset.
Zip files in general already support streaming. It would be nice if ro-crate-py would also be able to stream it's output into a stream object like io.BytesIO. This would make it compatible HTTP Streaming.
Example
If you follow the link below, there is a code example where I worked around this issue.
In the given example, the RO-Crates become quite large (TBytes). The generated zip would therefore never fit into the servers memory, or even on the hard drive. As a workaround, I use this library to build a crate object without actually downloading the files. I then create a zip stream where I add the written metadata file. Finally, I add all data files manually to the stream and serve it to the user as
StreamingHttpResponse
.https://github.com/dnlbauer/FAIR-workflow-platform-frontend/blob/5900dd331b6beeb7c4ca6e8ba6c1007247b05b16/cwr_frontend/cwr_frontend/views/DatasetDetailView.py#L168
The text was updated successfully, but these errors were encountered: