The original pipelines were assembled and written by Hyun Min Kang (hmkang@umich.edu) and Adrian Tan (atks@umich.edu) at the Abecasis Lab at the University of Michigan
See the variant calling pipeline and alignment pipeline repositories
If you are on Debian / Ubuntu, follow the instructions on Cloud SDK.
After you execute gcloud init
the installer asks you to log in and you should respond with Y
, head to the provided URL, copy the code and past it to the prompt. After that it will ask you for the cloud project you want to use, so you need to input the GCP Project ID. I picked us-west1-b
as the region.
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
gcloud auth login
After that run gcloud auth application-default --help
and follow the instructions. Briefly, run
gcloud iam service-accounts create <pick-a-username>
gcloud iam service-accounts keys create key.json --iam-account=<the-username-you-just-picked>@<your-service-account-name>.iam.gserviceaccount.com
That should print something like
created key [<some long integer>] of type [json] as [key.json] for [<username-you-picked>@<your-service-account-name>.iam.gserviceaccount.com]
You can check in the Google Cloud Platform console under IAM Service Accounts. That account you just created should be in the list.
Next create an environment variable that points to the file key.json
:
export GOOGLE_APPLICATION_CREDENTIALS=key.json
To run workflows of data stored on gcloud
you need to set an environment variable GOOGLE_APPLICATION_CREDENTIALS
, which holds the path to the credentials file.
cromwell
is a Java executable and requires a Java Runtime Engine. Follow the instruction here for a complete installation.
For Dockstore to run you need to install the Java Runtime Engine. Find installation instructions for Dockstore here.
To copy contents of a SDK bucket to your local system (or a VM) use
gsutil -u [PROJECT_ID] cp gs://[BUCKET_NAME]/[OBJECT_NAME] [OBJECT_DESTINATION]
A WDL and a JSON file to test checker workflows are in the test_data
directory. You need to adjust all paths in the JSON file to the paths on your system before running the checker. It has been tested with cromwell-31.jar
. To run the checker workflow for the WDL aligner navigate to respective directory (usually it has checker in its name) and run
java -Dconfig.file=<location_to_file> -jar ~/bin/<cromwell_version>.jar run <checker-workflow>.wdl -i <checker-workflow>.json
Please keep in mind that your costs may vary depending on how your data is formatted and what parameters you use. In addition, if you are using preemptibles, there is some element of randomness here -- a preemptible may or may not be stopped by Google at any given time, causing an in-progress task to need to restart.
When running the aligner workflow on 10 full-size CRAMs from the PharmaHD study imported from Gen3, running on the aligner's default settings, the cost was $80.38 as reported by Terra. The most expensive of those ten files cost $10.82 and the least expensive cost $5.74.
As the aligner checker runs the aligner and then simply preforms an md5sum, the cost for the aligner checker will be about the same as that of the aligner.