django_docker_engine
exists to make web-based visualizations reusable:
If you can wrap your visualization in a container image, and adhere to a few
conventions, your tool will remain useful and usable even when you've moved
on to other projects.
We'll outline here what a container image is and how to build one, what
particular conventions django_docker_engine
requires, some gotchas to be aware
of, and how to publish your tool.
If you haven't worked with containers before, read this introduction to Docker, and try the examples in this introcution to containers. The container ecology is complicated, but you only need to focus on one part, building your container and pushing it to a repository. Hopefully you can base your work on one of our examples.
A Dockerfile
gives step-by-step instructions for building an image, and
the first line gives the base image to build on top of: Base images are available
for every popular language and framework. After that you will add any additional
libraries you require, and finally the code for your particular project.
Here are samples from some of our projects:
The JavaScript example is different: The visualization is purely client-side, but because a small script to process the input on startup is required, the base image is actually Python.
It is best practice for your Dockerfile
to be under version control, and to use
Travis to build and push your images on successful commits. Whether this is the same
repo as the rest of your source code will depend on your situation:
- If your visualization will be used primarily through
django_docker_engine
, or if the code base and build process are relatively simple, it makes sense to keep it all in one repo - If there are other use cases for the visualization, if you are wrapping a 3rd party tool, or if the build process is already complicated, then it probably makes more sense set up a separate repo.
When your container starts up, an environment variable will specify the inputs
for the tool. Either INPUT_JSON
will be set to a JSON document,
or INPUT_JSON_URL
will point to the document.
The document will look like this.
- Input files are provided as a list of URLs under
file_relationships
. - Detailed metadata for each file in the dataset, not just your input, is provided
in
node_info
. - User supplied strings or numbers are available in
parameters
.
To run your container locally, check out this project and add a description of your
tool to tools.py
, and start the demo with python ./manage.py runserver
.
The image
field can reference images either on DockerHub or in your local cache.
There are a few things to be careful about when wrapping your tool.
- Port 80:
django_docker_engine
assumes your tool will use port 80. (Other ports can be accomodated, but that requires additional configuration.) - Relocatable: There will be a prefix in the path when your tool is served.
No URLs you use should begin with
/
. The prefix is available in theapi_prefix
of your input.json. - Keep the cookies: Your tool will ultimately be run inside django, so the django session cookie needs to be preserved on any AJAX requests you make. Right now, FF and Chrome have slightly different implementations of the HTML5 Fetch API in this regard.
- No WebSockets: WebSockets are a step beyond HTTP, and won't work across plain HTTP proxies like we use. (Shiny apps use WebSockets by default, but when run inside Shiny Server they can be configured to fall back to HTTP.)
- No server state: There are no provisions right now to preserve sessions or to make them available with restarts. Instead, try to update the query portion of the URL.
- Logging: The Docker way to log is simply to output to STDOUT. If any files
are updated or created, they should be listed in the
extra_directories
of your spec. Warning: The container itself has only limited disk space by default. This space can be exhausted if you write large log files or create a database outsideextra_directories
. - Friendly error page: Until your tool has fully started, it should either
not reply, or reply with a non-200 response to requests for
/
. Once it has started, a 200 should be returned. If your tool can't start for some reason, please still respond with 200, but give an error message in the body instead. Until requests for/
return 200,django_docker_engine
will keep a please-wait page up.
DockerHub is the most prominent repository of docker images. It is our source
for base images, and we push our own containerized tools back there as well. Once
you have registered for an account, you could push images by hand, but it is better
to add encrypted credentials to your .travis.yaml
and have Travis do it for you
on successful tagged builds. Here's a script
that many of our projects use.
Once you've pushed your image to DockerHub, you need to let the community know about it.
PRs to this repo which add to tools.py
are welcome, and you should make a similar
addition to the list of Refinery tools.