-
Notifications
You must be signed in to change notification settings - Fork 2
Bring your own tool dockerized
Virtual Research Environments integrate tools and pipelines to enforce a research community. We offer the possibility to integrate your application in one of these analytical platforms. Please, read though this documentation and contact us for any doubt or suggestion you may have.
open Virtual Research Environment offers a number of benefits for developers willing to integrate their tools in the platform:
- Open access platform publicly available
- A full web-based workbench with user support utilities, OIDC authentication service, and data handlers for local and remote datasets or repositories.
- Visibility for your tool and organization, with ownership recognition, tailored web forms, help pages and customized viewers.
- The possibility to add extra value to your tool by complementing it with other related tools already in the platform.
- A complete control of your tool through the administration panel, for monitoring, logging and managing your tool.
The application or pipeline to be integrated should:
- Be free and open source code
- Containerized (tested with Docker and Singularity)
- Run in non-interactive mode in linux-based operating system
Since the Virtual Research environment is a dockerized system, also for tools integration a similar dockerization method is followed, so to encapsulate the tools and their dependencies in a container, allowing for easy sharing, version control, and deployment.
This guide walks through the process of Dockerizing a sequence extraction tool and integrating it into a Virtual Research Environment (VRE) framework.
There are the steps to follow for achieving the integration of a your application as a new VRE tool dockerized versioln. As a result, the VRE is able to control the whole tool execution cycle. It:
- Automatically build the job-tune form on the web site with the parameters fields and inputs files of the Tool.
- Validate input files and parameters (format and data type filtering, maximum/minimum values, etc).
- Stage-in the required input files into the Tool working directory in the compute host (if required)
- Schedule the Tool in the cloud/HPC backend in a scalable manner
- Monitor and log tool progress during the execution
- Stage-out output files from the run working directory (if required)
- Registration at the website of the output files resulting from the execution
Within the OpenVRE environment, you will need to integrates the tool with OpenVRE Tool Dockerized framework. To do that, the VRE will need three elements:
- A docker image for your tool containing the application
- A docker image specifics for the VRE framework, contain the VRE RUNNER wrapper
- A list of descriptive metadata fields annotating the tool (i.e. input files rquirments, arguments, description)
The following guide will help you achive that:
- (1) Creating a Dockerfile for your tool
In this example we are going to use a SeqIo tool, a sequence extraction tool using Biopython, designed to filter and extract sequences from a FASTA file based on specified IDs and sequence length.
The Dockerfile sets up the environment by installing dependencies like Biopython and placing the necessary Python script into the container.
Create the Dockerfile in your project directory, that defines the environment and the tool configuration. Using as an example the extraction tool mentioned as before:
# Use a lightweight Python image
FROM python:3.9-slim
# Set the working directory inside the container
WORKDIR /home/
# Install Biopython for sequence handling
RUN pip install biopython
# Copy your Python script into the container
COPY seqio_tool/extract_sequences.py /home/seqio_tool/extract_sequences.py
# Make the Python script executable
RUN chmod +x /home/seqio_tool/extract_sequences.py
# Define the entry point for the container
ENTRYPOINT ["python", "/home/seqio_tool/extract_sequences.py"]
Make sure for the ENTRYPOINT to refer directly to the script/software that you would want to launch on the platform. The VRE framework will use this as the direct command for the wrapper.
Build the Docker image once the Dockerfile is set up, with the command:
docker build -t my_tool_image .
In this tool case, the image would be available on Docker Hub
- (2) Wrapping the Tool for openVRE
Clone the [vre_template_tool_dockerized](https://github.com/inab/vre_template_tool_dockerized/) in your system
git clone https://github.com/inab/vre_template_tool_dockerized/
cd vre_template_tool_dockerized/template/
This is gonna be our working directory from this point on.
In the Dockerfile_template, you would only needed to modify the FROM command:
INTEGRATE NEW TOOL CONTAINER
FROM #YOUR IMAGE NAME HERE
Remember to change the name of the Dockerfile from Dockerfile_template to Dockerfile to be able to build the image.