Topmed Workflows

About

The original pipelines were assembled and written by Hyun Min Kang (hmkang@umich.edu) and Adrian Tan (atks@umich.edu) at the Abecasis Lab at the University of Michigan

See the variant calling pipeline and alignment pipeline repositories

Installing dependencies on your local system

1. Cloud SDK (`gcloud`, `gsutil`)

If you are on Debian / Ubuntu, follow the instructions on Cloud SDK. After you execute gcloud init the installer asks you to log in and you should respond with Y, head to the provided URL, copy the code and past it to the prompt. After that it will ask you for the cloud project you want to use, so you need to input the GCP Project ID. I picked us-west1-b as the region.

Configuration and credentials file

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
gcloud auth login

After that run gcloud auth application-default --help and follow the instructions. Briefly, run

gcloud iam service-accounts create <pick-a-username>
gcloud iam service-accounts keys create key.json --iam-account=<the-username-you-just-picked>@<your-service-account-name>.iam.gserviceaccount.com

That should print something like

created key [<some long integer>] of type [json] as [key.json] for [<username-you-picked>@<your-service-account-name>.iam.gserviceaccount.com]

You can check in the Google Cloud Platform console under IAM Service Accounts. That account you just created should be in the list.

Next create an environment variable that points to the file key.json:

export GOOGLE_APPLICATION_CREDENTIALS=key.json

Providing credentials to your application

To run workflows of data stored on gcloud you need to set an environment variable GOOGLE_APPLICATION_CREDENTIALS, which holds the path to the credentials file.

2. Broad's execution engine `cromwell`

cromwell is a Java executable and requires a Java Runtime Engine. Follow the instruction here for a complete installation.

3. Dockstore

For Dockstore to run you need to install the Java Runtime Engine. Find installation instructions for Dockstore here.

Running workflows

Provisioning reference files

To copy contents of a SDK bucket to your local system (or a VM) use

gsutil -u [PROJECT_ID] cp gs://[BUCKET_NAME]/[OBJECT_NAME] [OBJECT_DESTINATION]

Checker workflows

A WDL and a JSON file to test checker workflows are in the test_data directory. You need to adjust all paths in the JSON file to the paths on your system before running the checker. It has been tested with cromwell-31.jar. To run the checker workflow for the WDL aligner navigate to respective directory (usually it has checker in its name) and run

java -Dconfig.file=<location_to_file> -jar ~/bin/<cromwell_version>.jar run <checker-workflow>.wdl -i  <checker-workflow>.json

Cost estimates for Terra

Please keep in mind that your costs may vary depending on how your data is formatted and what parameters you use. In addition, if you are using preemptibles, there is some element of randomness here -- a preemptible may or may not be stopped by Google at any given time, causing an in-progress task to need to restart.

Aligner (WDL)

When running the aligner workflow on 10 full-size CRAMs from the PharmaHD study imported from Gen3, running on the aligner's default settings, the cost was $80.38 as reported by Terra. The most expensive of those ten files cost $10.82 and the least expensive cost $5.74.

Aligner Checker (WDL)

As the aligner checker runs the aligner and then simply preforms an md5sum, the cost for the aligner checker will be about the same as that of the aligner.

Name		Name	Last commit message	Last commit date
Latest commit History 376 Commits
CRAM-no-header-md5sum		CRAM-no-header-md5sum
aligner		aligner
tests		tests
variant-caller		variant-caller
vcf-comparator		vcf-comparator
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topmed Workflows

About

Installing dependencies on your local system

1. Cloud SDK (`gcloud`, `gsutil`)

Configuration and credentials file

Providing credentials to your application

2. Broad's execution engine `cromwell`

3. Dockstore

Running workflows

Provisioning reference files

Checker workflows

Cost estimates for Terra

Aligner (WDL)

Aligner Checker (WDL)

About

Releases 31

Packages

Contributors 15

Languages

License

DataBiosphere/topmed-workflows

Folders and files

Latest commit

History

Repository files navigation

Topmed Workflows

About

Installing dependencies on your local system

1. Cloud SDK (gcloud, gsutil)

Configuration and credentials file

Providing credentials to your application

2. Broad's execution engine cromwell

3. Dockstore

Running workflows

Provisioning reference files

Checker workflows

Cost estimates for Terra

Aligner (WDL)

Aligner Checker (WDL)

About

Resources

License

Stars

Watchers

Forks

Releases 31

Packages 0

Contributors 15

Languages

1. Cloud SDK (`gcloud`, `gsutil`)

2. Broad's execution engine `cromwell`

Packages