Data package for B-cell lymphoma RNA-seq data from PRJNA477352
- Experimental data were generated by Zhao et al. Original citation: * Zhao X, Ren Y, Lawlor M, Shah BD et al. BCL2 Amplicon Loss and Transcriptional Remodeling Drives ABT-199 Resistance in B Cell Lymphoma Models. Cancer Cell 2019 May 13;35(5):752-766.e9. PMID: 31085176
- Processing:
- Sequencing reads were downloaded from SRA, at PRJNA477352
- Quantification was done by 2 alternative workflows:
- Using STAR 2.5.1a to align against the Gencode human genome v27, GRCh38.p10 and 92 ERCC sequences, and RSEM to estimate abundance levels for genes/isoforms.
- Similar to (1), but using STAR 2.7.1a
- Metadata is downloaded from SRA and cleaned up for standard field names. GEO metadata was checked but no extra information was found.
Install the package, import the library and load the ExpressionSet
of interest, for example
devtools::install_github('ttdtrang/data-rnaseq-lymphoma')
data(sarcoma.rnaseq.gene, package='data.rnaseq.lymphoma')
dim(lymphoma.rnaseq.gene.kallisto@assayData$exprs)
The package includes 4 data sets, 2 were processed with STAR_2.5-RSEM workflow, and 2 with STAR_2.7-RSEM workflow.
lymphoma.rnaseq.gene.star_rsem1
lymphoma.rnaseq.transcript.star_rsem1
lymphoma.rnaseq.gene.star_rsem2
lymphoma.rnaseq.transcript.star_rsem2
cd data-raw
- Download all necessary raw data files.
- Set the environment variable
DBDIR
to point to the path containing said files. It is assumed that files are organized into directories corresponding to workflow, e.g.
├── GSE116129_family.soft
├── make-data-package.nb.html
├── make-data-package.Rmd
├── parse_geo_metadata.nb.html
├── parse_geo_metadata.Rmd
├── PRJNA477352_metadata_cleaned.tsv
├── star_2.5-rsem
│ ├── feature_attrs.transcripts.tsv
│ ├── matrix.gene.expected_count.RDS
│ ├── matrix.gene.tpm.RDS
│ ├── matrix.transcripts.expected_count.RDS
│ ├── matrix.transcripts.tpm.RDS
│ └── starLog.final.tsv
└── star_2.7-rsem
├── feature_attrs.rsem.transcripts.tsv
├── matrix.gene.expected_count.RDS
├── matrix.gene.tpm.RDS
├── matrix.transcripts.expected_count.RDS
├── matrix.transcripts.tpm.RDS
└── starLog.final.tsv
- Run the R notebook
make-data-package.Rmd
to assemble parts intoExpressionSet
objects.