ZipStreamer is a golang microservice for streaming zip files from a series of web links, on the fly. For example, if you have 200 files on S3, and you want to download a zip file of them, you can do so in 1 request to this server.
Highlights include:
- Low memory: the files are streamed out to the client immediately
- Low CPU: the default server doesn't compress files, only packages them into a zip, so there's minimal CPU load (configurable)
- High concurrency: the two properties above allow a single small server to stream hundreds of large zips simultaneous
- Easy to host: several deployment options, including Docker images and two one-click deployers
- It includes a HTTP server, but can be used as a library (see
zip_streamer.go
)
Each HTTP endpoint requires a JSON description of the desired zip file. It includes a root object with the following structure:
suggestedFilename
[Optional, string]: The filename to suggest in the "Save As" UI in browsers. Defaults toarchive.zip
if not provided or invalid. Limited to US-ASCII.files
[Required, array]: an array descibing the files to include in the zip file. Each array entry required 2 properties:url
[Required, string]: the public URL of the file to include in the zip. Zipstreamer will fetch this via a GET request. The file must be publically accessible via this URL; if you're files are private, most file hosts provide query string authentication options which work well with Zipstreamer (example AWS S3 Docs).zipPath
[Required, string]: the path and filename where this entry should appear in the resulting zip file. This is a relative path to the root of the zip file.
Example JSON description with 2 files:
{
"suggestedFilename": "tps_reports.zip",
"files": [
{
"url":"https://server.com/image1.jpg",
"zipPath":"image1.jpg"
},
{
"url":"https://server.com/image2.jpg",
"zipPath":"in-a-sub-folder/image2.jpg"
}
]
}
This endpoint takes a http POST body containing the JSON zip file descriptor, and returns a zip file.
Example usage with curl
# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST /download endpoint, passing json descriptor in body
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/download > archive.zip
This endpoint fetches a JSON zip file descriptor hosted on another server, and returns a zip file. This is useful over the POST /download
endpoint for a few use cases:
- You want to hide from the client where the original files are hosted (see
zsid
parameter) - Use cases where POST requests aren't easy to adopt (traditional static webpages)
- You want to trigger a browsers' "Save File" UI, which isn't shown for POST requests. See
POST /create_download_link
for a client side alternitive to achieve this.
This endpoint requires one of two query parameters describing where to find the JSON zip file descriptor:
zsurl
: the full URL to the JSON file describing the zip. Example:/download?zsurl=https://yourserver.com/path_to_descriptors/82a1b54cd20ab44a916bd76a5
zsid
: must be used with theZS_LISTFILE_URL_PREFIX
environment variable. The JSON file will be fetched fromZS_LISTFILE_URL_PREFIX + zsid
. This allows you to hide the full URL path from clients, revealing only the end of the URL. Example:ZS_LISTFILE_URL_PREFIX = "https://yoursever.com/path_to_descriptors/"
and/download?zsid=82a1b54cd20ab44a916bd76a5
Example usage with curl
curl -X GET "http://localhost:4008/download?zsurl=https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip
# start server with ZS_LISTFILE_URL_PREFIX
ZS_LISTFILE_URL_PREFIX="https://gist.githubusercontent.com/scosman/" ./zipstreamer
# call `GET /download` endpoint with zsid
curl -X GET "http://localhost:4008/download?zsid=f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip
This endpoint takes a http POST body containing the JSON zip file descriptor, stores it in a local cache, and returns a link ID which allows the caller to fetch the zip file via an additional call to GET /download_link/{link_id}
.
This is useful for if you want to trigger a browser "Save File" UI, which isn't shown for POST requests. See GET /download
for a server side alternative to achieve this.
Important:
- These links only live for 60 seconds. They are expected to be used immediately.
- This stores the link in an in-memory cache, so it's not suitable for deploying to a multi-server cluster without extra configuration. If you are hosting on a multi-server cluster, see the deployment section for configuration advice.
Here is an example response body containing the link ID. See docs for GET /download_link/{link_id}
below for how to fetch this zip file:
{
"status":"ok",
"link_id":"b4ecfdb7-e0fa-4aca-ad87-cb2e4245c8dd"
}
Example usage: see GET /download_link/{link_id}
documentation below.
Call this endpoint with a link_id
generated with /create_download_link
to download that zip file.
Example usage with curl
Example curl usage of POST /create_download_link
and GET /download_link/{link_id}
endpoints working together
# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST endpoint to create link
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/create_download_link
# Call GET endpoint to download zip. Note: must copy UUID from output of above POST command into this URL
curl -X GET "http://localhost:4008/download_link/UUID_FROM_ABOVE" > archive.zip
Be sure to enable session affinity if you're using multiple servers and using /create_download_link
.
Cloud Run is ideal serverless environment for ZipStreamer, as it routes many requests to a single container instance. ZipStreamer is designed to handle many concurrent requests, and will be cheaper to run on this serverless architecture than a instance-per-request architecture like AWS Lamba or Google Cloud Functions.
Important
- The one-click deploy button has a bug and may force you to set the optional environment variables. If the server isn't working, check
ZS_URL_PREFIX
is blank in the Cloud Run console. - Be sure to enable session affinity if you're using using
/create_download_link
. Cloud Run may scale up to multiple containers automatically.
This repo contains an dockerfile, and an image is published on Github Packages.
To build your own image, clone the repo and run:
docker build --tag docker-zipstreamer .
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 docker-zipstreamer
Official packages are published on Github packages. To pull latest stable release:
docker pull ghcr.io/scosman/packages/zipstreamer:stable
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 ghcr.io/scosman/packages/zipstreamer:stable
Note: stable
pulls the latest github release. Use ghcr.io/scosman/packages/zipstreamer:latest
for top of tree.
These environment variables can be used to configure the server:
PORT
- Defaults to 4008. Sets which port the HTTP server binds to.ZS_URL_PREFIX
- If set, the server will verify theurl
property of the files in the JSON zip file descriptors start with this prefix. Useful to preventing others from using your server to serve their files.ZS_COMPRESSION
- Defaults to no compression. It's not universally known, but zip files can be uncompressed, and used only to combining many files into one file. Set toDEFLATE
to use zip deflate compression. WARNING - enabling compression uses CPU, and will reduce throughput of server. Note: for files with internal compression (JPEGs, MP4s, etc), zip DEFLATE compression will often increase the total zip file size.ZS_LISTFILE_URL_PREFIX
- See documentation forGET /download
I was mentoring at a "Teens Learning Code" class, but we had too many mentors, so I had some downtime.
Zipper portion of logo by Kokota from Noun Project (Creative Commons CCBY)