Releases · EBIvariation/eva-pipeline

20 Feb 10:36

cyenyxe

5e00d9d

Better progress tracking, dropping studies and multiple versions of Ensembl annotation supported Latest

Latest

Version 2.0 of the EVA pipeline has changed the workflow manager technology. Instead of tracking progress of steps as a whole (done / not done), it now split the work in chunks of configurable size. This way, if a step has processed millions of variants before failing, it will be resumed from that point instead of completely restarted.

This version replicates the functionality present in the previous version and adds the following:

Better job parameters management, report of user errors, etc.
New job for dropping studies
Multiple versions of functional annotation generated using Ensembl Variant Effect Predictor
Better detection of when the Ensembl VEP process hangs or fails
Displaying estimated progress percentage for several steps

Please note the database schema has changed to support multiple versions of the annotation and the migration tool can be found in a module of the repository eva-tools.

Assets 2

15 Oct 04:01

cyenyxe

v2.0-beta5

f5e9162

Support for multiple versions of VEP annotation Pre-release

Pre-release

The database schema has been heavily modified to support multiple versions of the Variant Effect Predictor (VEP) annotation. During the load of a VCF file, it is possible to annotate those variants using an existing or new version. The job that allowed to annotate already loaded studies or a whole database has been also modified accordingly.

The introduced database changes are:

The bulk of the 'annot' subdocument in the variants collection has been extracted to the 'annotations' collection
The variants collection only stores those fields used as indexes for efficient filtering
New 'annotationMetadata' collection introduced, listing the versions of VEP used to annotate the variants in the database

If you had been using a previous version of this software, please check out the eva-tools repository in order to obtain the database migration scripts.

Assets 2

26 May 13:27

cyenyxe

v2.0-beta4

bbce134

Annotation job and more batch friendly VEP execution Pre-release

Pre-release

A new job used to regenerate annotation only is included in this release. The execution of steps associated with VEP has also being improved, detecting when the external process hangs, and communicating progress and errors better.

Assets 2

22 Mar 12:19

cyenyxe

v2.0-beta3

8595df6

New job for dropping studies, improved progress tracking Pre-release

Pre-release

This release includes a new job that allows to drop a study. Users only need to provide the study ID and the job will take care of the variants and the files where they were reported.

Progress tracking has been improved in the step that loads variants into the system. The completion percentage is now displayed, along with the number of variants read from the VCF, written into the database, and skipped due to any kind of issue. This tracking will be added to other steps in future releases.

Assets 2

02 Mar 12:02

cyenyxe

v2.0-beta2

f64d6b8

Usability and software architecture improvements Pre-release

Pre-release

The initial goal of this beta release was to improve unit/integration tests and the architecture as a whole, to ensure it could be extended more easily in the future. As a side effect, we also managed to improve usability!

These are the outcomes of the last months of work:

Tests are now completely independent from each other, using random test folders and Mongo databases
Job parameters are fully validated using the Spring Batch API
Job parameters can be conveniently read from a properties file, in addition to CLI arguments

Assets 2

17 Oct 12:23

cyenyxe

v2.0-beta1

7daefab

Technology migration for better restartability Pre-release

Pre-release

Version 2.0 of the EVA pipeline will move from Luigi to Spring Batch. Instead of tracking progress of steps as a whole (done / not done), Spring Batch splits the work in chunks of configurable size. This way, if a step has processed millions of variants before failing, it will be resumed from that point instead of completely restarted.

The functionality implemented for this first beta includes:

Normalization of variants reported in a VCF file
Storage of variants in MongoDB
Calculation of allele frequencies and other statistics for all the samples in a VCF file
Annotation using Ensembl Variant Effect Predictor

Future beta releases will include support for population statistics via a PED file and improved usability.

Assets 2

02 Sep 14:05

cyenyxe

v1.1

eb95c43

Multi-step pipeline

Improved restartability by running the pipeline in multiple steps.

Added support for population statistics when combining a VCF + PED files, and also loading annotations generated by VEP.

Assets 2

02 Sep 13:59

cyenyxe

v1.0

e7f8e5e

EVA automated pipeline using Luigi

First production version of the European Variation Archive pipeline, implemented using Luigi by Spotify.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: EBIvariation/eva-pipeline

Better progress tracking, dropping studies and multiple versions of Ensembl annotation supported

Support for multiple versions of VEP annotation

Annotation job and more batch friendly VEP execution

New job for dropping studies, improved progress tracking

Usability and software architecture improvements

Technology migration for better restartability

Multi-step pipeline

EVA automated pipeline using Luigi