Skip to content

Commit

Permalink
Merge branch 'master' into p8-egs
Browse files Browse the repository at this point in the history
  • Loading branch information
madetunj authored Feb 23, 2022
2 parents 3eceaad + 363db47 commit 498289c
Show file tree
Hide file tree
Showing 22 changed files with 340 additions and 367 deletions.
Empty file modified LICENSE
100644 → 100755
Empty file.
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,23 +211,23 @@ The metrics are color flagged for easy visualization of overall performance in H

SEAseq metrics calculated to infer quality are (in alphabetical order):
| Quality Metric | Definition |
|--------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Aligned Percent | Percentage of mapped reads. |
| Base Quality | Per-base sequence quality. |
| Estimated Fragment Width | Average fragment size of the peak distribution. |
| Estimated Tag Length | Sequencing read length. |
| FRiP | The fraction of reads within peaks regions. |
| Linear Stitched Peaks (Enhancers) | Total number of clustered enriched regions. |
| Non-Redundant Fraction (NRF) | Fraction of uniquely mapped reads. |
| *Normalized Peaks* | Peaks identified with Input/Control correction *(applicable when Control is provided)*. |
| Normalized Strand-correlation Coefficient (NSC) | To determine signal-to-noise ratio using strand cross-correlation. The ratio of the maximum cross-correlation value divided by the background cross-correction. |
| Sequence Diversity | Sequence overrepresentation. If reads/sequences are overrepresented in the library. |
| PCR Bottleneck Coefficient (PBC) | It is a measure of library complexity determined by the fraction of genomic locations with exactly one unique read versus those covered by at least one unique reads. |
| Peaks | Total number of enriched regions. |
| Raw Reads | Total number of sequencing reads. |
| *Read Length* | Average FASTQ read length *(applicable when multiple FASTQs are provided)*. |
| Relative Strand-correlation Coefficient (RSC) | A strand cross-correlation ratio between the fragment-length cross-correlation and the read-length peak. |
| SE-like enriched regions (Super Enhancers) | Total number of SE-like clustered enriched regions. |
|--------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Aligned Percent | Percentage of mapped reads. |
| Base Quality | Per-base sequence quality. |
| Estimated Fragment Width | Average fragment size of the peak distribution. |
| Estimated Tag Length | Sequencing read length. |
| [FRiP] | The fraction of reads within peaks regions. |
| Linear Stitched Peaks (Enhancers) | Total number of clustered enriched regions. |
| Non-Redundant Fraction ([NRF]) | Fraction of uniquely mapped reads. |
| *Normalized Peaks* | Peaks identified with Input/Control correction *(applicable when Control is provided)*. |
| Normalized Strand-correlation Coefficient ([NSC]) | To determine signal-to-noise ratio using strand cross-correlation. The ratio of the maximum cross-correlation value divided by the background cross-correction. |
| Sequence Diversity | Sequence overrepresentation. If reads/sequences are overrepresented in the library. |
| PCR Bottleneck Coefficient ([PBC]) | It is a measure of library complexity determined by the fraction of genomic locations with exactly one unique read versus those covered by at least one unique reads. |
| Peaks | Total number of enriched regions. |
| Raw Reads | Total number of sequencing reads. |
| *Read Length* | Average FASTQ read length *(applicable when multiple FASTQs are provided)*. |
| Relative Strand-correlation Coefficient ([RSC]) | A strand cross-correlation ratio between the fragment-length cross-correlation and the read-length peak. |
| SE-like enriched regions (Super Enhancers) | Total number of SE-like clustered enriched regions. |
| Overall Quality | Cross-metric average score. |

## Frequently asked questions
Expand Down Expand Up @@ -265,3 +265,9 @@ None yet!
[AME]: https://meme-suite.org/meme/tools/ame
[MEME-ChIP]: https://meme-suite.org/meme/tools/meme-chip
[seaseq]: https://github.com/stjude/seaseq
[FRiP]: https://github.com/stjude/seaseq/blob/master/docs/definitions.md#fraction-of-reads-in-peaks-frip
[NRF]: https://github.com/stjude/seaseq/blob/master/docs/definitions.md#non-redundant-fraction-nrf
[NSC]: https://github.com/stjude/seaseq/blob/master/docs/definitions.md#normalized-strand-correlation-coefficient-nsc
[RSC]: https://github.com/stjude/seaseq/blob/master/docs/definitions.md#relative-strand-correlation-coefficient-rsc
[PBC]: https://github.com/stjude/seaseq/blob/master/docs/definitions.md#pcr-bottleneck-coefficient-pbc

6 changes: 6 additions & 0 deletions dnanexus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@
-project project-id \
-extras extras.json \
-folder /apps
#using dxCompiler-2.9.0
java -jar dxCompiler-2.9.0.jar compile seaseq.wdl \
-project project-id \
-extras extras.json \
-folder /apps
```
1. Upload test data
Expand Down
12 changes: 11 additions & 1 deletion dnanexus/stjude_seaseq/dxapp.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"title": "SEAseq (St. Jude)",
"summary": "Single-End Antibody sequencing workflow",
"dxapi": "1.0.1",
"version": "2.1.0",
"version": "2.2.0",
"openSource" : true,
"inputSpec": [
{
Expand Down Expand Up @@ -723,6 +723,11 @@
"class": "file",
"optional": true
},
{
"name": "jpg_h_gene",
"class": "file",
"optional": true
},
{
"name": "pdf_promoters",
"class": "file",
Expand All @@ -738,6 +743,11 @@
"class": "file",
"optional": true
},
{
"name": "jpg_h_promoters",
"class": "file",
"optional": true
},
{
"name": "peak_promoters",
"class": "file",
Expand Down
12 changes: 6 additions & 6 deletions dnanexus/stjude_seaseq/src/stjude_seaseq.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,16 @@ main() {
# git clone and configure SEAseq
git clone https://github.com/stjude/seaseq.git seaseq
cd seaseq
git checkout 2.1
git checkout 2.2
cd dnanexus
reorg_id=$(dx build -f /seaseq-reorg | jq -r '.id')
echo "Reorg applet ID: ${reorg_id}"
sed -ibak "s/applet-Fx40j6091FfQp1P99p6b5k2x/${reorg_id}/" extras.json
cd ..
timestamp=$(date +%s)
wget -nv https://github.com/dnanexus/dxWDL/releases/download/v1.50/dxWDL-v1.50.jar
echo '476621564b3b310b17598ee1f02a1865 dxWDL-v1.50.jar' > dxWDL-v1.50.jar.md5
md5sum -c dxWDL-v1.50.jar.md5
wget -nv https://github.com/dnanexus/dxCompiler/releases/download/2.9.0/dxCompiler-2.9.0.jar
echo '434b515609123f1092453eac87984027 dxCompiler-2.9.0.jar' > dxCompiler-2.9.0.jar.md5
md5sum -c dxCompiler-2.9.0.jar.md5

if grep -F "control_fastq" /home/dnanexus/job_input.json; then
SEASEQ="seaseq-control.wdl"
Expand All @@ -95,9 +95,9 @@ main() {
sed -i "s/import \"..\/tasks\/bedtools\.wdl/import \"\/home\/dnanexus\/seaseq\/workflows\/tasks\/bedtools\.wdl/" workflows/workflows/motifs.wdl
sed -i "s/import \"..\/tasks\//import \"\/home\/dnanexus\/seaseq\/workflows\/tasks\//" workflows/workflows/mapping.wdl

# compile SEAseq to dxwdl
# compile SEAseq to dxCompiler-2.9.0
dx mkdir -p "${DX_PROJECT_CONTEXT_ID}":/app-$timestamp/
wf_id=$(java -jar dxWDL-v1.50.jar compile $SEASEQ -project "${DX_PROJECT_CONTEXT_ID}" -folder /app-$timestamp -force -extras dnanexus/extras.json)
wf_id=$(java -jar dxCompiler-2.9.0.jar compile $SEASEQ -project "${DX_PROJECT_CONTEXT_ID}" -folder /app-$timestamp -force -extras dnanexus/extras.json)
echo "Workflow ID: ${wf_id}"

# specify output folder and input json
Expand Down
60 changes: 60 additions & 0 deletions docs/definitions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# SEAseq Quality Metrics Expatiated Terms and Definitions

To view the complete list of metrics SEAseq offersMost metrics are adopted
from [ENCODE]([Landt et al, Genome Res.2012]).

Additional information on applicable ChIP-seq metrics are provided
[here](https://genome.ucsc.edu/ENCODE/qualityMetrics.html).

[ENCODE]: https://www.encodeproject.org/data-standards/terms
[Landt et al, Genome Res.2012]: https://doi.org/10.1101/gr.136184.111


## Fraction of Reads in Peaks (FRiP)

Fraction of all mapped reads that fall into the called peak
regions, i.e. usable reads in significantly enriched peaks
divided by all usable reads. In general, FRiP scores
correlate positively with the number of regions.

## Non-Redundant Fraction (NRF)

Number of distinct uniquely mapping reads
(i.e. after removing duplicates) / Total number of reads.

## Normalized Strand-correlation Coefficient (NSC)

The NSC is the ratio of the maximal cross-correlation value
(which occurs at strand shift equal to fragment length)
divided by the background cross-correlation (minimum
cross-correlation value over all possible strand shifts).

## Relative Strand-correlation Coefficient (RSC)

The RSC is the ratio of the fragment-length cross-correlation
value minus the background cross-correlation value, divided by
the phantom-peak cross-correlation value minus the background
cross-correlation value. The minimum possible value is 0
(no signal), highly enriched experiments have values greater
than 1, and values much less than 1 may indicate low quality.

## PCR Bottleneck Coefficient (PBC)

A measure of library complexity, i.e.
how skewed the distribution of read
counts per location is towards 1 read per location.

PBC = N1/Nd

(where N1= number of genomic locations to which EXACTLY
one unique mapping read maps, and Nd = the number of
genomic locations to which AT LEAST one unique mapping
read maps, i.e. the number of non-redundant, unique mapping reads).


### Disclaimer

All definitions were copied from the resources cited below:
1. [ENCODE Quality Metrics](https://genome.ucsc.edu/ENCODE/qualityMetrics.html).
1. [ENCODE Data Standards](https://www.encodeproject.org/data-standards/terms).
1. [Landt et al, Genome Res.2012](https://doi.org/10.1101/gr.136184.111).
10 changes: 5 additions & 5 deletions scripts/seaseq_overall.header
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,11 @@
<div class="abbreviations">
<b>Abbreviations</b> (adopted from <a href="https://doi.org/10.1101/gr.136184.111" target="_blank">Landt et al, Genome Res. 2012</a>)
<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/#:~:text=A%20useful%20complexity%20metric%20is%20the%20fraction%20of%20nonredundant%20mapped%20reads%20in%20a%20data%20set%20(nonredundant%20fraction%20or%20NRF)%2C" target="_blank">NRF</a> : Non-Redundant Fraction.</li>
<li><a href="https://genome.ucsc.edu/ENCODE/qualityMetrics.html#:~:text=PCR%20Bottleneck%20Coefficient%20(PBC)" target="_blank">PBC</a> : PCR BottleNeck Coefficient.</li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/#:~:text=(normalized%20strand%20coefficient%2C%20NSC)" target="_blank">NSC</a> : Normalized Strand Cross-correlation coefficient.</li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/#:~:text=(relative%20strand%20correlation%2C%20RSC)" target="_blank">RSC</a> : Relative Strand Cross-correlation coefficient.</li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/#:~:text=FRiP%20(fraction%20of%20reads%20in%20peaks)." target="_blank">FRiP</a> : Fraction of Reads in Peaks.</li>
<li><a href="https://github.com/stjude/seaseq/blob/master/docs/definitions.md#non-redundant-fraction-nrf" target="_blank">NRF</a> : Non-Redundant Fraction.</li>
<li><a href="https://github.com/stjude/seaseq/blob/master/docs/definitions.md#pcr-bottleneck-coefficient-pbc" target="_blank">PBC</a> : PCR BottleNeck Coefficient.</li>
<li><a href="https://github.com/stjude/seaseq/blob/master/docs/definitions.md#normalized-strand-correlation-coefficient-nsc" target="_blank">NSC</a> : Normalized Strand Cross-correlation coefficient.</li>
<li><a href="https://github.com/stjude/seaseq/blob/master/docs/definitions.md#relative-strand-correlation-coefficient-rsc" target="_blank">RSC</a> : Relative Strand Cross-correlation coefficient.</li>
<li><a href="https://github.com/stjude/seaseq/blob/master/docs/definitions.md#fraction-of-reads-in-peaks-frip" target="_blank">FRiP</a> : Fraction of Reads in Peaks.</li>
</ul>
<br>
Definitions for all metrics can be found on the <a href="https://github.com/stjude/seaseq/#qc-metrics" target="_blank">SEAseq website</a>.
Expand Down
Loading

0 comments on commit 498289c

Please sign in to comment.