-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fixes env gtex issue #290 (#294) * Change env() to stdout to save sample_name in gen3_drs * Fix No such property: baseName for class: String * Gen3-DRS prints md5 "file is good" to log not stdout * Improves gen3-drs md5 error message * Changes gtex input to support new manifest file format [#289] (#296) * Updates ch_gtex_gen3_ids items #289 * Remove duplicate val(obj_id) in input of gen3-drs Co-authored-by: cgpu <38183826+cgpu@users.noreply.github.com> * Comments our fasta requirement for gen3-drs input (#297) * Comments our fasta requirement for gen3-drs input * Update usage.md that genome_fasta is only for CRAM * Update usage.md typo * Fix missing file from path issue * change GLS executor from parameter to scope (#305) * Remove gtex (#299) * Remove mentions of old GTEX download option from main.nf * Remove mentions of old GTEX download option from help * Remove mentions of old GTEX download option from usage.md * Renames Gen3-DRS into new GTEX download option * Renames Gen3-DRS into new GTEX download opt in usage.md * Dev v2.1 #287 - Simplify the Gen3-DRS download option (#304) * Update usage.md * Update run_on_sumner.md * add dockerfile for csvtoolkit * add process to convert manifest json to csv * add process to filter manifest by file passed through --reads * update help message * fix bug on variable declaration * Update nextflow.config - fix typo * Revert "Merge branch 'master' into dev-v2.1-#287" This reverts commit be2c2ab, reversing changes made to 04285ef. * Update main.nf * patch projectDir error * Fix oublishDir path for manifest * Fix oublishDir path for manifest * Fix typo * Update filter_manifest.py * Update filter_manifest.py * fix bug on saving filenames that were not in manifest file * Update filter_manifest.py * remove logging of samples not found in manifest * Update filter_manifest.py * Makes filter_manifest txt output optional Co-authored-by: angarb <62404570+angarb@users.noreply.github.com> Co-authored-by: Vlad-Dembrovskyi <64809705+Vlad-Dembrovskyi@users.noreply.github.com> Co-authored-by: Vlad-Dembrovskyi <vlad@lifebit.ai> * Rename examples/gen3/README.md to examples/GTEX/README.md Editing folder name to match new "download_from" name. * Update and rename GEN3_DRS_config.md to GTEX_config.md Updating parameters * Delete examples/gen3 directory * Update usage.md Moving this information * Update README.md * Update README.md * Delete PRJNA453538.SraRunTable.txt Not needed * Delete MCF10_MYCER.datafiles.csv Not needed * Create reads.csv Adding reads.csv example * Update README.md * Create manifest.json Adding example manifest.json * Update README.md * Update run_on_cloudos.md * Update Copying_Files_From_Sumner_to_Cloud.md Made neater * Create Star_Index_Generation.md Co-authored-by: cgpu <38183826+cgpu@users.noreply.github.com> Co-authored-by: imendes93 <73831087+imendes93@users.noreply.github.com> Co-authored-by: angarb <62404570+angarb@users.noreply.github.com>
- Loading branch information
1 parent
33ba660
commit 49cd023
Showing
20 changed files
with
220 additions
and
235 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
import os | ||
import sys | ||
import shutil | ||
import pandas as pd | ||
|
||
def __main__(): | ||
|
||
manifest = sys.argv[1] | ||
reads = sys.argv[2] | ||
print("Input manifest file:", manifest) | ||
print("Input read file: ", reads) | ||
|
||
manifest_df = pd.read_csv(manifest, index_col=None, header=0, delimiter=",") | ||
|
||
if reads != "PASS": | ||
# process metadata | ||
reads_df = pd.read_csv(reads, index_col=None, header=0, delimiter=",") | ||
manifest_df = manifest_df[manifest_df['file_name'].isin(reads_df['file_name'].tolist())] | ||
|
||
if manifest_df.empty: | ||
print("Manifest file is empty after filtering.") | ||
sys.exit(404, "Manifest file is empty after filtering.") | ||
else: | ||
print("Number of samples in filtered manifest:") | ||
print(len(manifest_df)) | ||
|
||
# save final manifest file | ||
manifest_df.to_csv("filtered_manifest.csv", sep=",", index=False) | ||
|
||
if __name__=="__main__": __main__() |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
A minimal set of params need to run when downloading option is GTEX. Test is done with following params on a dev environment. | ||
|
||
```yaml | ||
params { | ||
reads = splicing-pipelines-nf/examples/GTEX/reads.csv | ||
manifest = manifest.json | ||
run_name = gtex_gen3 | ||
download_from = GTEX | ||
key_file = credentials.json | ||
gtf = gencode.v32.primary_assembly.annotation.gtf | ||
star_index = /mnt/shared/gcp-user/session_data/star_75 | ||
assembly_name = GRCh38 | ||
readlength = 75 | ||
stranded = false | ||
gc_disk_size = 200.GB | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
FROM nfcore/base:1.9 | ||
LABEL authors="ines@lifebit.ai" \ | ||
description="Docker image containing csvkit toolkit, including in2csv" | ||
|
||
COPY environment.yml / | ||
RUN conda env create -f /environment.yml && conda clean -a | ||
ENV PATH /opt/conda/envs/csvkit/bin:$PATH |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
name: csvkit | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults | ||
- anaconda | ||
dependencies: | ||
- python=3.8 | ||
- csvkit=1.0.5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,32 @@ | ||
//add singularity to $PATH: | ||
# Moving files from HPC to Cloud (particular to JAX Sumner) | ||
|
||
#### Add singularity to $PATH: | ||
module load singularity | ||
|
||
//make some convenience commands to reduce typing (note we changed container name so we can accommodate other cloud providers): | ||
#### Make some convenient commands to reduce typing: | ||
alias gcloud="singularity exec /projects/researchit/crf/containers/gcp_sdk.sif gcloud" | ||
alias gsutil="singularity exec /projects/researchit/crf/containers/gcp_sdk.sif gsutil" | ||
|
||
//login to gcloud; this will return a url that you need to paste into a browser, which | ||
//will take you through the google authentication process; you can use your jax | ||
//email as userid and jax password to get in. Once you authenticate, it will display | ||
//a code that you need to paste into the prompt provided in your ssh session on Sumner: | ||
|
||
#### Login to gcloud; this will return a url that you need to paste into a browser, which will take you through the google authentication process; you can use your jax email as userid and jax password to get in. Once you authenticate, it will display a code that you need to paste into the prompt provided in your ssh session on Sumner: | ||
gcloud auth login --no-launch-browser | ||
|
||
//see which projects you have access to: | ||
#### See which projects you have access to: | ||
gcloud projects list | ||
|
||
//what is the project you are currently associated with: | ||
#### What is the project you are currently associated with: | ||
gcloud config list project | ||
|
||
//change project association: | ||
#### Change project association: | ||
gcloud config set project my-project | ||
|
||
//see what buckets are associated with my-project: | ||
#### See what buckets are associated with my-project: | ||
gsutil ls | ||
|
||
//see contents of a particular bucket: | ||
#### See contents of a particular bucket: | ||
gsutil ls -l gs://my-bucket | ||
|
||
//recursively copy large directory from filesystem accessible on Sumner to your bucket: | ||
#### Recursively copy large directory from file system accessible on Sumner to your bucket: | ||
gsutil -m -o GSUtil:parallel_composite_upload_threshold=150M cp -r my_dir gs://my_bucket/my_dir | ||
|
||
//recursively copy a directory from your bucket to an existing directory on Sumner: | ||
#### Recursively copy a directory from your bucket to an existing directory on Sumner: | ||
gsutil -m -o GSUtil:parallel_composite_upload_threshold=150M cp -r gs://my_bucket/my_dir my_dir |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
## Generating Star Indices | ||
|
||
To run the pipeline, you will need star indexes (preferably that match you read length). | ||
|
||
This might be a helpful resource to generate multiple star indices: | ||
https://github.com/TheJacksonLaboratory/Star_indices | ||
|
||
This is also a useful resource: https://github.com/alexdobin/STAR |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
## Run with GTEX data | ||
You can run pipeline on GTEX data otained directly from Gen3-DRS if you specify input option: | ||
``` | ||
--download_from 'GTEX' | ||
``` | ||
|
||
You will be needing two things from - https://gen3.theanvil.io/ | ||
|
||
1. [manifest file](https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/dev-v2.1/examples/GTEX/manifest.json) | ||
2. credentials file | ||
|
||
Original downloaded `manifest.json` will be converted into `manifest.csv` with pipeline using: https://csvkit.readthedocs.io/en/latest/ | ||
|
||
The manifest.csv will be subset using the `reads.csv` file provided in `--reads` param. (This allows you to download a complete manifest and later select the samples of interest.) For example: [gtex.reads](https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/dev-v2.1/examples/GTEX/reads.csv) | ||
|
||
Downloaded `credentials.json` file can be provided in `--key_file` param. | ||
NOTE: Make sure `credentials.json` is a latest one. They have expiration dates when you download. | ||
|
||
If you running with AnviL Gen3-DRS to download CRAM files you also need to provide a Genome fasta file with `--genome_fasta`, which will be used to convert CRAM files to BAM format. If you are donwloading bam files, you can skip this parameter. | ||
|
||
For a minimal params list check [gtex.config](../conf/examples/GTEX_config.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
[ | ||
{ | ||
"md5sum":"x1x111xxx1xxxxx1xx1x1x11xxx11111", | ||
"file_name": "GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam", | ||
"object_id":"dg.ANV0/yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyy", | ||
"file_size":123321365 | ||
}, | ||
{ | ||
"md5sum":"x2x222xxx2xxxxx2xx2x2x22xxx22222", | ||
"file_name": "GTEX-XXXXX-XXXX-XX-XXXXZ.Aligned.sortedByCoord.out.patched.md.bam", | ||
"object_id":"dg.ANV0/yyyyyyyy-yyyy-yyyy-yyyyyyyyyzzz", | ||
"file_size":123321369 | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
sample_id | ||
GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam | ||
GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam | ||
GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam | ||
GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam | ||
GTEX-XXXXX-XXXX-XX-XXXXX.Aligned.sortedByCoord.out.patched.md.bam |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.