This project will wrangle short-read genomic alignments, for example from wastewater-sampling, into a format for easy import into the SILO sequencing database.
The V-Pipe Docker is designed to process a single .bam
file and upload the results to SILO.
silo-input-transformer
: Is a rust based utility to handle thefasta
tondjson
transformation and is here imported as a git submodule..github/workflows
: Contains GitHub Actions used for building, testing, and publishing. install, and whether or not to mount the project directory into the container..vscode/settings.json
: Contains VSCode settings specific to the project, such as the Python interpreter to use and the maximum line length for auto-formatting.src
: Place new source code here.scripts
: Place new source code here, temporary and intermediate works.tests
: Contains Python-based test cases to validate source code.pyproject.toml
: Contains metadata about the project and configurations for additional tools used to format, lint, type-check, and analyze Python code.
To build the package and maintain dependencies, we use Poetry. In particular, it's good to install it and become familiar with its basic functionalities by reading the documentation.
- Create and activate the conda environment from the
environment.yml
file:
conda env create -f environment.yml
conda activate sr2silo
- Set up the environment with development tools:
poetry install --with dev
poetry run pre-commit install
Then, you will be able to run tests:
$ poetry run pytest
... or check the types:
$ poetry run pyright
Alternatively, you may prefer to work with the right Python environment using:
$ poetry shell
$ pytest
This is currently implemented as script and under heavy development. To run, we recommend a build as a docker compose as it relies on other RUST components.
Edit the docker-compose.env
file in the docker-compose
directory with the following paths:
SAMPLE_DIR=../../../data/sr2silo/daemon_test/samples/A1_05_2024_10_08/20241024_2411515907/alignments/
SAMPLE_ID=A1_05_2024_10_08
BATCH_ID=20241024_2411515907
TIMELINE_FILE=../../../data/sr2silo/daemon_test/timeline.tsv
NEXTCLADE_REFERENCE=sars-cov2
RESULTS_DIR=./results
KEYCLOAK_TOKEN_URL=https://authentication-wise-seqs.loculus.org/realms/loculus/protocol/openid-connect/token
SUBMISSION_URL=https://backend-wise-seqs.loculus.org/test/submit?groupId={group_id}&dataUseTermsType=OPEN
CI=false
KEYCLOAK_TOKEN_URL and SUBMISSION_URL are used for the submission to lapis.
CI determines if sr2silo
runs in a Continuous Integration pipeline and shall mock
uploads and skip submissions.
To upload the processed outputs S3 storage is required.
For sensitive information like AWS credentials, use Docker secrets. Create the following files in the secrets directory:
secrets/aws_access_key_id.txt
:
YourAWSAccessKeyId
secrets/aws_secret_access_key.txt
:
YourAWSSecretAccessKey
secrets/aws_default_region.txt
:YourAWSRegion
To process a single sample, run the following command:
docker-compose --env-file .env up --build
The code quality checks run on GitHub can be seen in
.github/workflows/test.yml
for the python package CI/CD,
We are using:
- Ruff to lint the code.
- Black to format the code.
- Pyright to check the types.
- Pytest to run the unit tests code and workflows.
- Interrogate to check the documentation.
This project welcomes contributions and suggestions. For details, visit the repository's Contributor License Agreement (CLA) and Code of Conduct pages.