Scripts to download the PacBio, ONT and MiSeq datasets used in https://www.nature.com/articles/s41598-017-03996-z and run the pipelines as described in the paper or simply download the final assemblies as generated by the authors.
Download repository:
git clone https://github.com/fg6/YeastStrainsStudy.git
Usage:
$ ./launchme.sh <command> <strain>
command: command to be run. Options: install, download, check, deepcheck, clean, nanoclean,
finalfastas, findassembly
strain: Download data for this strain/s, only for command=download, check or deepcheck
Options: s288c,sk1,cbs,n44,all [s288c]
With the script launchme.sh you can download the whole datasets used in the analysis of the paper https://www.nature.com/articles/s41598-017-03996-z to run the pipelines yourself, or download only the final assemblies generated by the authors of the paper.
!!! Warning !!!: due to a recent protocol change in the EBI database this scripts fails to export
MiSeq cram files to fastqs. If you are experiencing this problem please use scramble
(https://www.biorxiv.org/content/early/2014/03/28/003640) to export to fastqs,
or download the fastq files directly from ENA.
To just look at the assemblies generated by the pipelines:
$ ./launchme.sh finalfastas
$ ./launchme.sh findassembly
!!!!! Warning !!!!!
This script is interactive: It will ask you which strain, assembler or platform you want to focus on
$ ./launchme.sh install
$ ./launchme.sh download <strain>
strain= s288c, sk1, n44, cbs or all [s288c]
$ ./launchme.sh check <strain>
strain= s288c, sk1, n44, cbs or all [s288c]
If the check give you warnings, probably some file failed to download properly,
follow the instructions given in the output
If the instructions do not help, try with
$ ./launchme.sh deepcheck <strain>
Step 4/A. If everything looks ok and there are no warnings from Step 3, you can clean up the data folders, deleting every intermediate files and folders:
$ ./launchme.sh clean <strain>
!!!!! Warning !!!!!
1. Please run this only after Step 3 and only if Step 3 showed no errors or warnings,
otherwise you will have to download everything again!
2. Please do not run this if you intend to run Nanopolish,
as Nanopolish needs the s288c fast5 files, run instead Step 4/B
Step 4/B. If everything looks ok and there are no warnings, you can clean up the data folders, deleting every intermediate files and folders not needed by Nanopolish:
$ ./launchme.sh nanoclean <strain>
!!!!! Warning !!!!!
Please run this only after Step 3 and only if Step 3 showed no errors or warnings,
otherwise you will have to download everything again!
If not cleaning up: 1.7TB
After cleaning all (clean): < 30GB.
After cleaning all except files for Nanopolish (nanoclean): ~700GB
A python version >= 2.7 is needed. Please make sure this is available in your PATH, together with virtualenv. C++11 required.
After 'launchme.sh', you can run the various pipelines, from the 'pipelines' folder
example:
cd pipelines
./canu.sh <canu_location> <strain> <platform> <cov>
For details on the pipelines look at pipelines/README.md or launch each script with option "-h"