Skip to content

Latest commit

 

History

History
64 lines (57 loc) · 2.22 KB

strandedness.md

File metadata and controls

64 lines (57 loc) · 2.22 KB
title description category subcategory
Stranded RNA-seq libraries.
Explains strandedness and where to find info in bcbio.
research
rnaseq

Bulk RNA-seq libraries retaining strand information (stranded) are useful to quantify expression with higher accuracy for opposite strand transcripts which overlap or have overlapping UTRs. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1876-7.

Bcbio RNA-seq pipeline has a 'strandedness' parameter: [unstranded|firststrand|secondstrand]
https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html?highlight=strand#configuration. <- link not working*

The terminology was inherited from Tophat, see the detailed description in the Salmon doc. https://salmon.readthedocs.io/en/latest/library_type.html Note, that firstrand = ISR for PE and SR for SE.

If the strandedness is unknown, run a small subset of reads with 'unstranded' in bcbio and check out what Salmon reports in bcbio_project/final/sample/salmon/lib_format_counts.json:

{
    "read_files": [
        "/dev/fd/63",
        "/dev/fd/62"
    ],
    "expected_format": "IU",
    "compatible_fragment_ratio": 1.0,
    "num_compatible_fragments": 721856,
    "num_assigned_fragments": 721856,
    "num_frags_with_concordant_consistent_mappings": 692049,
    "num_frags_with_inconsistent_or_orphan_mappings": 47441,
    "strand_mapping_bias": 0.9477291347866986,
    "MSF": 0,
    "OSF": 0,
    "ISF": 36174,
    "MSR": 0,
    "OSR": 0,
    "ISR": 655875,
    "SF": 37676,
    "SR": 9765,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

Here the majority of reads are ISR.

Another way to check strand bias is
bcbio_project/final/sample/qc/qualimap_rnaseq/rnaseq_qc_results.txt.
It has SSP estimation (fwd/rev) = 0.04 / 0.96 meaning strand bias (ISR, firststrand).

Yet another way to confirm strand bias is seqc.
http://rseqc.sourceforge.net/#infer-experiment-py.
It uses a small subset of the input bam file:
infer_experiment.py -r /bcbio/genomes/Mmusculus/mm10/rnaseq/ref-transcripts.bed -i test.bam

This is PairEnd Data
Fraction of reads failed to determine: 0.1461
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0177
Fraction of reads explained by "1+-,1-+,2++,2--": 0.8362