Data Management Portal and Pipeline Application for Research Teams
Bioloop is a web-based portal to simplify the management of large-scale datasets shared among research teams in scientific domains. This platform optimizes data handling by effectively utilizing both cold and hot storage solutions, like tape and disk storage, to reduce overall storage costs.
Key Features:
-
Project-Based Organization: Data is assigned to projects, allowing collaborators to work within specific project environments, ensuring data isolation and efficient collaboration.
-
Data Ingestion: Bioloop simplifies data ingestion by offering automated triggers for instrument-based data ingestion and supports manual uploads for datasets.
-
Data Provenance Tracking: Bioloop tracks data lineage, recording the origin of raw datasets and their subsequent derived data products, promoting data transparency and accountability.
-
Custom Pipelines: Bioloop allows custom data processing pipelines, leveraging Python's Celery task queue system to efficiently scale processing workers as needed.
-
Secure Downloads: Bioloop ensures data security with token-based access mechanisms for downloading data, restricting access to authorized users.
-
Microservice Architecture: Bioloop utilizes Docker containers for effortless deployment, allowing flexibility between local infrastructure and public cloud environments."
Bioloop leverages a few other projects to get up and running.
- SCA OAuth Application
- Rest api for non-python languages to make use of rhythm celery workflow library
- Rhythm python module for celery workflow patterns