Simple scripts to collate per-sample bioinformatic QC metrics. Supports fgbio, Picard, and CSV metric files.
If you say to yourself, "all I want to do is see some QC metrics for my samples", you've come to the right place.
*** This repository is under active development. Use at your own risk. ***
Python 3.6 or higher is required.
To clone the repository: git clone https://github.com/nh13/bfx-qc-reporter.git
.
To install locally: python setup.py install
.
The tool-chain can be run with bfx-qc-reporter
.
See the conda-recipe branch.
The collation scripts are located in the scripts
folder.
The load-metrics
command will collate per-sample metric files into a single JSON file for consumption either by the user or by the create-report
command.
Additionally, a flattened CSV file will also be created.
All sample-specific metric files should live in a single directory, and that each metric file for each sample has the same metric extension.
For example, the metric file for Picard's AlignmentSummaryMetrics
could be located in <output-dir>/<sample-name>.alignment_summary_metrics.txt
.
The file extension and metrics to be collated are user-configurable with the --metric-defs
option; run bfx-qc-reporter load-metrics --help
for more information.
Specifying the name of each sample individually:
python bfx-qc-reporter load-metrics \
--output-dir <dir-with-metric-files> \
--output-prefix <output-path-prefix> \
--sample-names sample1 sample2 ... sampleN
Specifying the sample names using the output of fgbio's DemuxFastqs:
python bfx-qc-reporter load-metrics \
--output-dir <dir-with-metric-files> \
--output-prefix <output-path-prefix> \
--demux-barcode-metrics <path/to/demux_barcode_metrics.txt>
The create-report
command extracts specific metrics from the load-metrics
JSON output and writes a JSON file with only those specific metrics.
Additionally, a flattened CSV file will also be created.
Run bfx-qc-reporter create-report --help
for more information.
Using the default metrics to report:
python bfx-qc-reporter create-report \
--input </path/to/metrics.json> \
--output-prefix <output-path-prefix>;
Specifying a custom set of metrics to report in report_defs.csv
:
python bfx-qc-reporter create-report \
--input </path/to/metrics.json> \
--report-defs report_defs.csv \
--output-prefix <output-path-prefix>;
The src/html/index.html
webpage can be used to load the output of load-metrics
to allow interactive browsing of metrics across one or more samples.
The page also allows the user to sub-select the metrics to display.
*** This functionality is under active development. ***
The scripts and webpage were written for my own needs, and quickly, on my own free time. Please feel free to contribute!
{
"Sample-1": {
"Alignment Summary Metrics": {
"FIRST_OF_PAIR": {
"total_reads": 10000,
"pf_reads": 10000,
"pct_pf_reads": 1,
"pf_noise_reads": 0,
"pf_reads_aligned": 9999
}
},
"Duplication Metrics": {
"None": {
"library": 1,
"unpaired_reads_examined": 0,
"read_pairs_examined": 10000,
"secondary_or_supplementary_rds": 0,
"unmapped_reads": 0,
"unpaired_read_duplicates": 0,
"read_pair_duplicates": 0,
"read_pair_optical_duplicates": 0,
"percent_duplication": 0,
"estimated_library_size": ""
}
}
},
"Sample-2": {
"Alignment Summary Metrics": {
"FIRST_OF_PAIR": {
"total_reads": 10000,
"pf_reads": 10000,
"pct_pf_reads": 1,
"pf_noise_reads": 0,
"pf_reads_aligned": 9999
}
},
"Duplication Metrics": {
"None": {
"library": 1,
"unpaired_reads_examined": 0,
"read_pairs_examined": 10000,
"secondary_or_supplementary_rds": 0,
"unmapped_reads": 0,
"unpaired_read_duplicates": 0,
"read_pair_duplicates": 0,
"read_pair_optical_duplicates": 0,
"percent_duplication": 0,
"estimated_library_size": ""
}
}
}
}