RELEASE NOTES

Release 0.7.4

In this release we correct a nextflow issue in the GRIDSS_ASSEMBLY step used in the human PTA workflow.

Release 0.7.3

In this release we make a updates to the ATAC workflow, and correct issues related to the PTA workflow.

ATAC:

Merging of replicate samples is now supported. Use the --merge_replicates option, along with a CSV input file. See the wiki page for details on CSV setup.
GRCm39 pseudo-references generated with G2Gtools are now supported. Previously, GRCm38 was supported via the --chain option. For GRCm39, VCI files are required input also specified with --chain

PTA:

The mouse PTA workflow would crash when all somatic CNVs were filtered, we have corrected this.
Numerous adjustments to adjustments to memory and wall clock limits were made to support high coverage WGS data.

Pipelines Added:

None

Modules Added:

modules/g2gtools/g2gtools_vci_convert.nf

Pipeline Changes:

workflows/atac.nf: Replicate merging added. GRCm39 pseudo-reference support added.
subworkflows/aria_download_parse.nf: Support for replicate merging added.
subworkflows/concatenate_local_files.nf: Support for replicate merging added.

Module Changes:

modules/cosmic/cosmic_add_cancer_resistance_mutations_germline.nf: wallclock and memory request increase.
modules/gridss/gridss_assemble.nf: memory request increase, and java heap adjustment.
modules/gridss/gripss_somatic_filter.nf: memory request increase, and java heap adjustment.
modules/illumina/manta.nf: memory and wallclock requests were made flat rather than scaled to input file size.
modules/picard/picard_mergesamfiles.nf: correct file vs. path nextflow issue.
modules/python/python_somatic_vcf_finalization.nf: wallclock requests increase.
modules/python/python_somatic_vcf_finalization_mouse.nf: wallclock requests increase.
modules/r/plot_delly_cnv.nf: add dynamic plot naming based on sampleID
modules/samtools/samtools_chain_sort_fixmate_bam.nf: alter module to re-sort final filtered BAM prior to possible replicate merge.
modules/samtools/samtools_non_chain_reindex.nf: alter module to re-sort final filtered BAM prior to possible replicate merge.
modules/samtools/samtools_stats_insertsize.nf: wallclock request increase.
modules/svaba/svaba.nf: memory and wallclock requests increase.

Scripts Added:

None

Script Changes:

bin/gbrs/generate_emission_prob_avecs.py: Modify for use with non-DO strain IDs and dynamic number of strains.
bin/pta/annotate-bedpe-with-cnv.r: Capture edge case where all somatic CNV are filtered.
bin/pta/annotate-cnv-delly.r: Capture edge case where all somatic CNV are filtered.
bin/pta/delly_cnv_plot.r: Capture edge case where all somatic CNV are filtered.

NF-Test Modules Added:

None

Release 0.7.2

In this minor release we correct a bug in --workflow atac. In this workflow, the macs2 module was configured to use a user defined parameter tmpdir for scratch space. However, if the specified tmpdir did not exist, macs2 would fail silently, and allow the workflow to continue. This behavior has been fixed.

Release 0.7.1

In this minor release we change the Xengsort container to include GNU sort rather than BusyBox sort. This change was required to process very large FASTQ files.

In our testing, BusyBox sort requires files to be held in memory during sorting, and does not support the use of temporary files. The use of GNU sort allows for temporary files to be generated and alleviates the need to hold entire files in memory. This change has no impact on output from Xengsort, or any associated workflow.

Release 0.7.0

In this release we add a new workflow for calling copy number variation (CNV) from raw Illumina IDAT genotype array files. Currently the Illumina IlluminaCytoSNP v2.1 array is supported, but support for additional arrays is possible.

We make additional minor changes as described below.

Pipelines Added:

CNV calling from Illumina genotype array data (--cnv_array)

Modules Added:

modules/bcftools/bcftools_gtct2vcf.nf
modules/bcftools/bcftools_query_ascat.nf
modules/illumina/iaap_cli.nf
modules/ascat/ascat_run.nf
modules/ascat/ascat_annotation.nf

Pipeline Changes:

None

Module Changes:

Replaced the incorrect ${task.mem} with ${task.memory} in the Nextflow error catch statement in modules related to the SV calling workflows.
utility_modules/gzip.nf: Memory request increase

Scripts Added:

cnv_array/ASCAT_run.R
cnv_array/annotate_ensembl_genes.pl
cnv_array/seg_plot.R
cnv_array/segment_raw_extend.pl

Script Changes:

None

NF-Test Modules Added:

tests/workflows/cnv_array.nf.test

Release 0.6.7

In this release we make the following minor adjustments:

Correct syntax errors in the Xengsort module when running single-end data.
Minor adjustments to EMASE and GBRS help and log information to include the gen_org param.
Bump the version of MultiQC to v1.23.
Increase the memory request for a PTA moudles: python_merge_prep.nf and python_reorder_vcf_columns.nf.
Add CHECK_STRANDEDNESS to multiQC output for PDX RNAseq
Increased job memory request in example run scripts.

Release 0.6.6

In this release, we add a FASTQ sorting function to the Xengsort module. Due to asynchronous multi-threading in the classification step, Xengsort produces FASTQ output with non-deterministic sort order. BWA produces subtly different mapping results when reads in otherwise identical FASTQ inputs are shuffled (see note from BWA developer here). The slight mapping differences are not enough to impact overall results, but do prevent fully reproducible results when Xengsort is used and reads are not sorted. The addition of the sorting function allows for fully reproducible results, with no additional user action required.

Release 0.6.5

In this minor release, we fix a subscript out of bounds bug in bin/wes/sequenza_seg_na_window.R.

Release 0.6.4

In this release, we adjust memory and wallclock requirements for a number of modules, update read_group_from_fastq.py from python2 to python3, and incorporate PRs #4 and #5.

PR #4 (contributed by @BrianSanderson) adds an optional gene and transcript count merge across samples in the RNA and PDX RNA workflows (merge accessed via including the --merge_rna_counts flag).
PR #5 (contributed by @alanhoyle) adds a catch for corrupt gzip files in the Bowtie module as used by EMASE/GRBS analyses.

Pipelines Added:

None

Modules Added:

utility_modules/merge_rsem_counts.nf

Pipeline Changes:

workflows/rnaseq.nf module added to merge gene and transcript expression when --merge_rna_counts is used.
workflows/pdx_rnaseq.nf module added to merge gene and transcript expression when --merge_rna_counts is used.

Module Changes:

bowtie/bowtie.nf pipefail catch added for corrupt gzip files, per #5.
fastp/fastp.nf save json report as well as html report.
nygenome/lancet.nf wallclock request increase.
picard/picard_markduplicates.nf memory adjustment, and accounting for MarkDuplicates not fully respecting -Xmx memory limits imposed by Java.
picard/picard_reordersam.nf memory request increase.
picard/picard_sortsam.nf memory request increase.
utility_modules/read_groups.nf container changed to py3.

Script Changes:

bin/shared/read_group_from_fastq.py update from py2 to py3.

Release 0.6.3

In this release we change the read disambiguation tool Xenome for Xengsort. Extensive benchmarking shows high concordance among results obtained from both tools.

Additionally, we correct an issue with the human PTA workflow when running the combination of the --pdx and --split_fastq options. Data run with this combination of options from version 0.6.0-0.6.2 should be re-run.

Pipelines Added:

None

Modules Added:

xengsort/xengsort_classify.nf
xengsort/xengsort_index.nf

Pipeline Changes:

Xengsort replaces Xenome for all PDX based workflows (RNAseq, RNA fusion, Hs PTA, Somatic WES, Somatic WES PTA)
Correction made for the Human PTA when running the combination of the --pdx and --split_fastq options.

Module Changes:

None

Release 0.6.2

In this minor release we adjust memory and wall clock statements, and modified bin/pta/merge-caller-vcfs.r to correct for an edge case related bug.

Release 0.6.1

In this minor release we added support for automatic Zenodo releases via github actions. There are no changes or additions to workflows.

Release 0.6.0

In this major release we add seven new workflows, and make numerous changes to existing workflows. Specific changes are discussed below.

For Jackson Laboratory users this release now supports the new Sumner2 cluster. To use workflows on sumner, simply specify: -profile sumner2. Note that Sumner has reached end of life, and will no longer be supported going forward. We have updated all example run scripts to use -profile sumner2.

Note Sumner2 enforces strict Linux cgroups, which holds jobs to the memory and cpu limits requested by each Nextflow module. In our release testing, we increased many memory reservation steps; however, additional memory issues are to be expected. If you encounter OOM (out of memory) issues and experience workflow steps failing with killed reported in the error log, please either email us: (ngsOps@jax.org) or submit an issue with details on which module failed and the size of the dataset you were running.

Related to memory and time restrictions, we made signficiant changes to the PTA, WGS, and WES workflows:

For human PTA, WGS and WES analyses GATK BaseRecalibrator is now scattered by chromosome.
For PTA and WGS, options were added to allow users to:
1. Deduplicate reads with Clumpify prior to mapping steps.
2. Split FASTQ files into batched chunks for subsequent mapping. Mapped batches are merged prior to the GATK MarkDuplicates step.
3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling. This can help reduce computational load when calling variants in higher coverage areas of the genome.
For PTA, WGS and WES FASTP is now used for read and adapter trimming.

We have included an option to specify the root location of the omics_share reference file set. For Jax users on Sumner2, this option should not be changed and defaults to /projects/omics_share. For external users, or those on Elion, specify the root directory of reference files with --reference_cache </path/to/omics_share>.

Finally, we added testing modules for all workflows to be used with (nf-test)[https://www.nf-test.com/].

Pipelines Added:

Amplicon Sequencing (supporting human only at this time): General PCR / Targeted Sequencing
Genetic Ancestry (See https://www.biorxiv.org/content/10.1101/2022.10.24.513591v1 for details and methods)
Germline Structural Variant Calling
1. Illumina short-read data
2. Pacific Biosciences (PacBio) long-read data: CCS and CLR modes
3. Oxford Nanopore Technologies long-read data (ONT)
Somatic Whole Exome Sequencing for tumor-only samples (with option for PDX)
Somatic Whole Exome Sequencing for Paired Tumor Analysis (PTA; with option for PDX)

Modules Added:

modules/abra2/abra2.nf
modules/bbmap/bbmap_clumpify.nf
modules/bcftools/bcftools_annotate.nf
modules/bcftools/bcftools_call.nf
modules/bcftools/bcftools_duphold_filter.nf
modules/bcftools/bcftools_filter.nf
modules/bcftools/bcftools_merge_amplicon.nf
modules/bcftools/bcftools_mpileup.nf
modules/bcftools/bcftools_norm.nf
modules/bcftools/bcftools_rehead_sort.nf
modules/bcftools/bcftools_vcf_to_bcf.nf
modules/bedops/bedops_sort.nf
modules/bedops/bedops_window.nf
modules/bedtools/bedtools_sequenza_subtract.nf
modules/bwa/bwa_index.nf
modules/bwa/bwa_mem2.nf
modules/delly/delly_call.nf
modules/delly/delly_call_germline.nf
modules/delly/delly_cnv_germline.nf
modules/duphold/duphold.nf
modules/freebayes/freebayes.nf
modules/gatk/gatk_baserecalibrator_interval.nf
modules/gatk/gatk_calculatecontamination.nf
modules/gatk/gatk_calculatecontamination_tumorOnly.nf
modules/gatk/gatk_filtermutectcalls_wes.nf
modules/gatk/gatk_gatherbqsrreports.nf
modules/gatk/gatk_getpileupsummaries.nf
modules/gatk/gatk_getpileupsummaries_tumorOnly.nf
modules/gatk/gatk_haplotypecaller_amplicon.nf
modules/gatk/gatk_learnreadorientationmodel.nf
modules/gatk/gatk_mutect2_wes_pta.nf
modules/gatk/gatk_printreads.nf
modules/gatk/gatk_variantfiltration_freebayes.nf
modules/illumina/manta_germline.nf
modules/jvarkit/jvarkit_biostar154220.nf
modules/lumpy/lumpy_call_sv.nf
modules/lumpy/lumpy_extract_splits.nf
modules/lumpy/lumpy_prep.nf
modules/minimap/minimap2_index.nf
modules/minimap/minimap2_map_ont.nf
modules/nanofilt/nanofilt.nf
modules/nanoqc/nanoqc.nf
modules/nanostat/nanostat.nf
modules/nanosv/nanosv.nf
modules/pbmm2/pbmm2_call.nf
modules/pbmm2/pbmm2_index.nf
modules/pbsv/pbsv_call.nf
modules/pbsv/pbsv_discover.nf
modules/picard/picard_markduplicates_removedup.nf
modules/picard/picard_sortsam_mmrsvd.nf
modules/porechop/porechop.nf
modules/python/python_add_AF_freebayes.nf
modules/python/python_add_AF_haplotypecaller.nf
modules/python/python_annot_depths.nf
modules/python/python_annot_on_target.nf
modules/python/python_bedpe_to_vcf.nf
modules/python/python_parse_depths.nf
modules/python/python_parse_survivor_ids.nf
modules/r/illumina_sv_merge.nf
modules/r/r_merge_depths.nf
modules/samtools/samtools_cat.nf
modules/samtools/samtools_filter_mmrsvd.nf
modules/samtools/samtools_merge.nf
modules/samtools/samtools_mpileup.nf
modules/samtools/samtools_stats_mmrsvd.nf
modules/scarhrd/scarhrd.nf
modules/sequenza/sequenza_annotate.nf
modules/sequenza/sequenza_na_window.nf
modules/sequenza/sequenza_pileup2seqz.nf
modules/sequenza/sequenza_run.nf
modules/smoove/smoove_call_germline.nf
modules/sniffles/sniffles.nf
modules/snpweights/snpweights_inferanc.nf
modules/snpweights/snpweights_vcf2eigenstrat.nf
modules/survivor/survivor_annotation.nf
modules/survivor/survivor_bed_intersect.nf
modules/survivor/survivor_inexon.nf
modules/survivor/survivor_merge.nf
modules/survivor/survivor_to_bed.nf
modules/survivor/survivor_vcf_to_table.nf
modules/tumor_mutation_burden/tmb_score.nf
modules/utility_modules/filter_trim.nf
modules/vcftools/vcftools_filter.nf

NF-Test Modules Added:

tests/workflows/amplicon_fingerprint.nf.test
tests/workflows/amplicon_generic.nf.test
tests/workflows/ancestry.nf.test
tests/workflows/atac.nf.test
tests/workflows/chipseq.nf.test
tests/workflows/emase.nf.test
tests/workflows/gbrs.nf.test
tests/workflows/generate_pseudoreference.nf.test
tests/workflows/prep_do_gbrs_inputs.nf.test
tests/workflows/prepare_emase.nf.test
tests/workflows/pta.nf.test
tests/workflows/rna_fusion.nf.test
tests/workflows/rnaseq.nf.test
tests/workflows/rrbs.nf.test
tests/workflows/somatic_wes.nf.test
tests/workflows/somatic_wes_pta.nf.test
tests/workflows/wes.nf.test
tests/workflows/wgs.nf.test

Pipeline Changes:

chipseq.nf: Error reporting added for malformed CSV input files
pta.nf: Error reporting added for malformed CSV input files
subworkflows/hs_pta.nf: JAX_TRIMMER replaced with FASTP. GATK Baserecalibration is now scattered by chromosome. Options added to: 1. deduplicate reads with Clumpify prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling. Additionally, short_alignment_marking following mapping was previously disconnected for the workflow. This step has been included.
subworkflows/mm_pta.nf: JAX_TRIMMER replaced with FASTP. Options added to: 1. deduplicate reads with Clumpify prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling.
rnaseq.nf: Check Strandedness log data added to MultiQC report.
wes.nf: JAX_TRIMMER replaced with FASTP. For human analysis, GATK Baserecalibration is now scattered by chromosome.
wgs.nf: JAX_TRIMMER replaced with FASTP. For human analysis, GATK Baserecalibration is now scattered by chromosome. Options added to: 1. deduplicate reads with Clumpify prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling.

Module Changes:

alntools/alntools_bam2emase.nf: Bump docker container version.
bedtools/bedtools_genomecov.nf: Memory request increase.
biqseq2/bicseq2_normalize.nf: Adjustment to read length parsing logic.
bowtie/bowtie.nf: Memory request increase.
bwa/bwa_mem.nf: Input tuple adjustment.
bwa/bwa_mem_hla.nf: Input tuple adjustment.
deeptools/deeptools_filter_remove_multi_sieve.nf
emase/emase_create_hybrid.nf: Bump docker container version.
emase/emase_get_common_alignment.nf: Bump docker container version. Memory request increase.
emase/emase_prepare_emase.nf: Bump docker container version.
emase/emase_run.nf: Bump docker container version.
ensembl/varianteffectpredictor_germline_mouse.nf: Input tuple adjustment. Add BGZIP and indexing to final VCF output.
fastp/fastp.nf: Memory request increase.
gatk/gatk_applybqsr.nf: Added tmp dir to command.
gatk/gatk_baserecalibrator.nf: Added tmp dir to command.
gatk/gatk_chain_extract_badreads.nf: Added tmp dir to command.
gatk/gatk_chain_filter_reads.nf: Added tmp dir to command.
gatk/gatk_cnnscorevariants.nf: Added tmp dir to command.
gatk/gatk_combinegvcfs.nf: Added tmp dir to command.
gatk/gatk_depthofcoverage.nf: Added tmp dir to command.
gatk/gatk_filtermutectcalls.nf: Added tmp dir to command.
gatk/gatk_filtervarianttranches.nf: Added tmp dir to command.
gatk/gatk_genotype_gvcf.nf: Added tmp dir to command.
gatk/gatk_getsamplename.nf: Added tmp dir to command.
gatk/gatk_getsamplename_noMeta.nf: Added tmp dir to command.
gatk/gatk_haplotypecaller.nf: Added tmp dir to command.
gatk/gatk_haplotypecaller_interval.nf: Added tmp dir to command.
gatk/gatk_haplotypecaller_sv_germline.nf: Added tmp dir to command.
gatk/gatk_indexfeaturefile.nf: Added tmp dir to command.
gatk/gatk_mergemutectstats.nf: Added tmp dir to command.
gatk/gatk_mergevcf.nf: Added tmp dir to command.
gatk/gatk_mergevcf_list.nf: Added tmp dir to command.
gatk/gatk_mutect2.nf: Added tmp dir to command.
gatk/gatk_mutect2_tumorOnly.nf: Added tmp dir to command.
gatk/gatk_selectvariants.nf: Added tmp dir to command.
gatk/gatk_sortvcf_germline.nf: Added tmp dir to command.
gatk/gatk_sortvcf_somatic_merge.nf: Added tmp dir to command.
gatk/gatk_sortvcf_somatic_tools.nf: Added tmp dir to command.
gatk/gatk_updatevcfsequencedictionary.nf: Added tmp dir to command.
gatk/gatk_variantfiltration.nf: Added tmp dir to command.
gatk/gatk_variantfiltration_af.nf: Added tmp dir to command.
gatk/gatk_variantfiltration_mutect2.nf: Added tmp dir to command.
gbrs/gbrs_bam2emase.nf: Bump docker container version. Memory request increase.
gbrs/gbrs_compress.nf: Bump docker container version. Memory request increase.
gbrs/gbrs_export.nf: Bump docker container version.
gbrs/gbrs_interpolate.nf: Bump docker container version.
gbrs/gbrs_plot.nf: Bump docker container version.
gbrs/gbrs_quantify.nf: Bump docker container version.
gbrs/gbrs_quantify_genotype.nf: Bump docker container version.
gbrs/gbrs_reconstruct.nf: Bump docker container version.
gridss/gridss_assemble.nf: Memory request increase.
illumina/strelka2.nf: Wallclock request increase.
multiqc/multiqc.nf: Tool version updated to v1.21
nygc-short-alignment-marking/short_alignment_marking.nf: Bug correction in original module script.
picard/picard_cleansam.nf: Output naming adjustment.
picard/picard_collectalignmentsummarymetrics.nf: Added tmp dir to command.
picard/picard_collecttargetpcrmetrics.nf: Added tmp dir to command.
picard/picard_collectwgsmetrics.nf: Added tmp dir to command.
picard/picard_fix_mate_information.nf: Corrected BAM sort order of output to coordinate.
picard/picard_sortsam.nf: Added index creation option.
samtools/samtools_calc_mtdna_filter_chrm.nf: Memory request increase.
samtools/samtools_faidx.nf: Input tuple adjustment, and output reorganization.
snpeff_snpsift/snpeff_snpeff.nf: Memory request increase. Tmp dir adjustment.
snpeff_snpsift/snpsift_extractfields.nf: Added support for amplicon_generic, somatic_wes, and somatic_wes_pta workflows.
squid/squid_call.nf: Memory request increase.
utility_modules/chipseq_make_genome_filter.nf: Input tuple adjustment.
utility_modules/jax_trimmer.nf: File output naming adjusted.

Script Added:

ancestry/vcf2eigenstrat.py: Convert VCF to EigenStrat format.
germline_sv/annot_vcf_with_depths.py: Add info fields for depths from individual caller to VCF files.
germline_sv/annot_vcf_with_exon.py: Apply 'InExon' INFO fields to original SV VCF files.
germline_sv/annot_vcf_with_on_target.py: Apply 'OnTarget' tINFO fields to original SV VCF files.
germline_sv/bedpetovcf.py: Convert BEDPE format back to SURVIVOR like VCF.
germline_sv/clean_sniffles.sh: Adjust Sniffles calls.
germline_sv/cnvnator2VCF.pl: Convert CNVnator formatted files to VCF.
germline_sv/hydra_to_vcf.py: Convert Hydra BEDPE output into VCF 4.1 format.
germline_sv/merge_depths.R: Merge nanoSV and Sniffles read/support depths.
germline_sv/merge_sv.r: Merge an arbitrary number of VCFs, and annotate with simple event type.
germline_sv/parse_caller_depths.py: Parse SV caller VCFs to extract IDs and depth information.
germline_sv/parse_survivor_ids.py: Parse SURVIVOR merged VCFs to extract IDs.
germline_sv/sed_unquote.sh: script to remove double-quotes from files, which is used avoid issues with unescaped quotes in Nextflow script blocks.
germline_sv/summarize_intersections.R: Intersect SV calls by type with known structural variant databases.
germline_sv/surv_annot.sh: Adjust SURVIVOR output to txt.
germline_sv/surv_annot_process.R: Adjust surv_annot output by SV type.
germline_sv/sv_to_table.py: Parse SURVIVOR merged VCF to output a summary table for each variant that lists the position, type, and size.
wes/AF_freebayes.py: Add Estimated Allele Frequency (ALT_AF) to the INFO field of FreeBayes VCF output.
wes/AF_haplotypecaller.py: Add Estimated Allele Frequency (ALT_AF) to the INFO field of HaplotypeCaller VCF output.
wes/TMB_calc.R: Compute tumor mutation burden. See Somatic WES wiki for details.
wes/allele_depth_min_and_AF_from_ADs.py: ecompute the locus depth from the allele-depths, and filter based on a minimum total allele depth. wes/ensembl_annotation.pl: Annotates Ensembl transcripts and genes with copy number and breakpoints.
wes/scarHRD.R: Compute homologous recombination deficiency (HRD) with scarHRD.
wes/sequenza_run.R: Compute copy number variantion with Sequenza.
wes/sequenza_seg_na_window.R: Filter Sequenza CNV segments with NA calls within 1Mb windows.

Script Changes:

gbrs/gene_bp_to_cM_to_transprob.R: Added local BIOMART_CACHE location.
pta/make_main_vcf.py: Adjusted genomic build check logic blocks.
pta/make_txt.py: Adjusted genomic build check logic blocks.
pta/merge-caller-vcfs.r: Added logic to catch edge case where no variants were within a VCF for merging.
shared/extract_csv.nf: Added error reporting for malformed CSV input files.
shared/extract_gbrs_csv.nf: Added error reporting for malformed CSV input files.

Release 0.5.0

In this release we have added the mouse version of PTA, and changed the read trimmer for the RNAseq pipeline to Fastp. Additionally, the latest version of Nextflow is now supported.

Note for Jackson Laboratory users on the Sumner cluster: Fastscratch has reached end of life, and is no longer supported. We have updated all example run scripts to point at /flashscratch rather than /fastscratch. For production analyses all working directories (i.e., -w <PATH>) should use /flashscratch/$USER/....

Pipelines Added:

Mouse PTA

Modules Added:

bcftools/bcftools_bcf_to_vcf.nf
bcftools/bcftools_compress_index.nf
bcftools/bcftools_merge_delly_cnv.nf
bcftools/bcftools_query_delly_cnv.nf
delly/delly_call_somatic.nf
delly/delly_classify.nf
delly/delly_cnv_somatic.nf
delly/delly_filter_somatic.nf
ensembl/varianteffectpredictor_germline_mouse.nf
ensembl/varianteffectpredictor_somatic_mouse.nf
fastp/fastp.nf
gatk/gatk_updatevcfsequencedictionary.nf
python/python_somatic_vcf_finalization_mouse.nf
r/annotate_delly_cnv.nf
r/annotate_genes_sv_mouse.nf
r/annotate_sv_mouse.nf
r/annotate_sv_with_cnv_mouse.nf
r/filter_bedpe_mouse.nf
r/merge_sv_mouse.nf
r/plot_delly_cnv.nf
smoove/smoove_call.nf
svtyper/svtyper.nf
utility_modules/gzip.nf
utility_modules/lumpy_compress_index.nf

Pipeline Changes:

RNAseq: The read trimmer script was replaced with fastp. STAR logs from RSEM now saved and passed to MultiQC for summary.
Human PTA: The read trimmer script was replace with fastp.

Module Changes:

bwa/bwa_mem.nf: Wallclock and memory request adjustment.
emase/emase_get_common_alignment.nf: Wallclock request adjustment.
gatk/gatk_applybqsr.nf: Wallclock request adjustmnet.
gatk/gatk_sortvcf_somatic_tools.nf: Added mouse PTA support.
gridss/gridss_assemble.nf: Update container to correct bug in prior container build. Wallclock and memory adjustment.
gridss/gridss_calling.nf: Update container to correct bug in prior container build.
gridss/gridss_preprocess.nf: Update container to correct bug in prior container build.
lumpy_sv/lumpy_sv.nf: Modified previously unused module for use in mouse PTA.
msisensor2/msisensor2.nf: Correct cp error that can occur on nextflow resume.
msisensor2/msisensor2_tumorOnly.nf: Correct cp error that can occur on nextflow resume.
multiqc/multiqc.nf: Added cpu, memory, and wallclock requests.
nygenome/lancet.nf: Memory request adjustment.
nygenome/lancet_confirm.nf: Memory request adjustment.
picard/picard_addorreplacereadgroups.nf: Memory request adjustment. Adjusted PICARD temp directory to Nextflow work directory.
picard/picard_collectalignmentsummarymetrics.nf: Wallclock request adjustment.
picard/picard_collecthsmetrics.nf: Wallclock request adjustment.
picard/picard_reordersam.nf: Memory request adjustment. Adjust PICARD temp directory to Nextflow work directory.
picard/picard_sortsam.nf: Wallclock request adjustment.
python/python_lymphoma_classifier.nf: Typo correction in output name.
python/python_somatic_vcf_finalization.nf: Added explicit genome support to facilitate adding mouse to PTA.
python/python_split_mnv.nf: Memory request adjustment.
r/annotate_sv.nf: Added explicit genome support to facilitate adding mouse to PTA.
r/annotate_sv_with_cnv.nf: Minor output file name adjustment.
rsem/rsem_alignment_expression.nf: Memory request adjustment. Remove dynamic memory request for STAR genome sort to correct memory failure errors. Added support to save STAR alignment logs.
samtools/samtools_filter_unique_reads.nf: Adjust expected file name input.
snpeff_snpsift/snpsift_annotate.nf: Adjusted output file name with respect to PTA.
svaba/svaba.nf: Adjust Nextflow output streams to caputure index files.
utility_modules/jax_trimmer.nf: Wallclock request adjustment.
xenome/xenome.nf: Wallclock and memory request adjustment. Adjusted temp directory for fastq-sort to Nextflow work directory.
All modules: ${task.memory} replaced the incorrect ${task.mem} in the Nextflow error catch statement.

Script Added:

pta/annotate-bedpe-with-genes-mouse.r: Removed human specific database expectations.
pta/annotate-cnv-delly.r: Adjusted CNV annotation for Delly output.
pta/delly_cnv_plot.r: Added Delly CNV plot.

Script Changes:

pta/annotate-bedpe-with-databases.r: Added genome support. For BED annotations, the existing script checks for ANY overlap between BED intervals. For mouse data, this lead to errant overlaps in small InDEL and inversion regions; therefore, mouse PTA requires 80% overlap between target region and query BED.
pta/filter-bedpe.r: For mouse PTA we know the type of SV event annotated from databases; therefore, we filter only calls that match annotation type (i.e., DEL, INS, INV). Adjustment to CNV breakpoint checks for cases when breakpoints are not present for targets being annotated. This can occur in mouse PTA due to the change to Delly CNV calling.
pta/make_main_vcf.py: Added explicit genome support to facilitate adding mouse to PTA.
pta/make_txt.py: Added explicit genome support to facilitate adding mouse to PTA.
pta/merge-caller-vcfs.r: Added support for Delly. For Manta the 'infer missing breakpoint' was added as the caller does not insert the reciprocal call in the VCF as the other callers do.

Release 0.4.5

In this minor release we have updated GBRS and EMASE containers to include a correction made on an index position bug in GBRS genotype printing. GBRS was failing to print the final gene genotype on each chromosome to the *.genotype.tsv file.

Pipelines Added:

None

Modules Added:

None

Pipeline Changes:

None

Module Changes:

All EMASE and GBRS modules updated to the latest version of the EMASE/GBRS container.

Release 0.4.4

In this minor release we have corrected a syntax error in the parsing of single end CSV input to EMASE and GBRS. The syntax error prevented the workflow from running single end data when CSV input files were used.

Pipelines Added:

None

Modules Added:

None

Pipeline Changes:

EMASE: Correct csv single end parsing syntax.
GBRS: Correct csv single end parsing syntax.

Module Changes:

None

Release 0.4.3

In this minor release we have patched PTA to correct for a potential script error relating annotating CNVs and SVs on chromosome Y.

Pipelines Added:

None

Modules Added:

None

Pipeline Changes:

PTA: Adjusted when chromosome Y is included vs. excluded in caller merge and annotation steps.

Module Changes:

None

Release 0.4.2

In this minor release we have made minor adjustments to the amplicon workflow, and added strandedness log output.

Pipelines Added:

None

Modules Added:

None

Pipeline Changes:

Amplicon: Alignment statistics are now taken post BQSR re-alignment.

Module Changes:

Primerclip: memory request increase.
python/python_check_strandedness.nf: added log file output.

Release 0.4.1

In this release we have added one additional pipeline: amplicon sequencing. This pipeline support the analysis of IDT xGen Amplicon panels, with current file support for xGen Human Sample ID Amplicon Panel. Additionally, we have added a classifier for EBV-associated PDX lymphomas to the PDX RNA pipeline.

Pipelines Added:

Amplicon

Modules Added:

python/python_generate_fingerprint_report.nf
python/python_lymphoma_classifier.nf

Pipeline Changes:

PDX RNAseq: added a classifier for EBV-associated PDX lymphomas.

Module Changes:

Cutadapt module function renamed from 'FILTER_FASTQ' to 'CUTADAPT'. Module file name adjusted to cutdadapt/cutadapt.nf
python/python_check_strandedness.nf: Added strandedness override parameter for cases when check_strandedness fails to determine strand directionality. Corrected logic bug associated with parsing output from the tool.
rsem/rsem_alignment_expression.nf: Resource request adjustment.

Release 0.4.0

In this release we have added five additional pipelines as part of the genetic diversity analysis suite. These pipelines support the analysis of genetically diverse samples (e.g., DO and CC mice) with EMASE and GBRS, and the generation of reference files required for running these tools.

Pipelines Added:

EMASE
GBRS
Generate Pseudoreference
Prepare EMASE Reference/Inputs
Prepare DO GBRS Inputs

Modules Added:

alntools/alntools_bam2emase.nf
bowtie/bowtie.nf
bowtie/bowtie_build.nf
emase/emase_create_hybrid.nf
emase/emase_get_common_alignment.nf
emase/emase_prepare_emase.nf
emase/emase_run.nf
g2gtools/g2gtools_convert.nf
g2gtools/g2gtools_extract.nf
g2gtools/g2gtools_gtf2db.nf
g2gtools/g2gtools_patch.nf
g2gtools/g2gtools_transform.nf
g2gtools/g2gtools_vcf2vci.nf
gbrs/gbrs_bam2emase.nf
gbrs/gbrs_compress.nf
gbrs/gbrs_export.nf
gbrs/gbrs_interpolate.nf
gbrs/gbrs_plot.nf
gbrs/gbrs_quantify.nf
gbrs/gbrs_quantify_genotype.nf
gbrs/gbrs_reconstruct.nf
python/append_dropped_chroms.nf
python/clean_prepEmase_transcriptList.nf
python/parse_gene_positions.nf
python/parse_transprobs.nf
r/do_transition_probablities.nf
r/generate_grid_file.nf
samtools/samtools_faidx_g2gtool.nf
utility_modules/filter_gtf_biotypes.nf
utility_modules/snorlax.nf

Pipeline Changes:

None

Module Changes:

None

Release 0.3.1

In this minor release we have modified the behavior of Xenome to output compressed FASTQ files, and to delete the intermediate FASTQ files that are generated. We are implementing this change because the previous behavior of Xenome resulted in a large amount of redundant data in work directories.

We also added PDX test data for RNA-fusion.

Pipelines Added:

None

Modules Added:

None

Pipeline Changes:

Changes to PDX RNA-seq, PDX WES, PDX RNA Fusion, and PDX PTA to reflect modifications to Xenome

Module Changes:

xenome/xenome.nf modified to combine xenome classify and fastq-sort into the XENOME_CLASSIFY module. For non-fusion applications, human and mouse reads are now emitted as compressed .fastq.gz files
Removed fastq-tools/fastq-sort.nf as its functionality is now in xenome/xenome.nf
Modified input type specification for kallisto/kallisto_insert_size.nf to address issue with flash storage mounting in Singularity.
Added text file to pubDir statement in Picard collectRNAseqMetrics

Release 0.3.0

In this major release we have added two additional pipelines, added flexibility for specifying inputs via sample sheets, support for downloading remote input data, support for GRCm39, support for PDX data, and many more changes detailed below. Additionally, we have added the concept of "subworkflows" for tasks that are more complex than a module and/or involve multiple containers, yet can be potentially re-used in multiple pipelines.

Pipelines Added:

ChIP-seq - human, mouse
Paired Tumor Analysis (somatic/germline WGS) - human, PDX

Subworkflows Added:

Aria download for remote input data
Concatenate paired tumor/normal FASTQ files
RNA-seq for PDX input data

Modules Added:

arriba/arriba.nf
bamtools/bamtools_filter.nf
bcftools/bcftools_germline_filter.nf
bcftools/bcftools_intersect_lancet_candidates.nf
bcftools/bcftools_merge_callers.nf
bcftools/bcftools_remove_spanning.nf
bcftools/bcftools_split_multiallelic_regions.nf
bcftools/bcftools_split_multiallelic.nf
bedtools/bedtools_amplicon_metrics.nf
bedtools/bedtools_genomecov.nf
bedtools/bedtools_start_candidates.nf
biqseq2/bicseq2_normalize.nf
biqseq2/bicseq2_seg_unpaired.nf
biqseq2/bicseq2_seg.nf
conpair/conpair_pileup.nf
conpair/conpair.nf
cosmic/cosmic_add_cancer_resistance_mutations_germline.nf
cosmic/cosmic_add_cancer_resistance_mutations_somatic.nf
cosmic/cosmic_annotation_somatic.nf
cosmic/cosmic_annotation.nf
deeptools/deeptools_computematrix.nf
deeptools/deeptools_plotfingerprint.nf
deeptools/deeptools_plotheatmap.nf
deeptools/deeptools_plotprofile.nf
ensembl/varianteffectpredictor_germline.nf
ensembl/varianteffectpredictor_somatic.nf
fastq-tools/fastq-pair.nf
fastq-tools/fastq-sort.nf
fusion_report/fusion_report.nf
fusioncatcher/fusioncatcher.nf
gatk/gatk_cnnscorevariants.nf
gatk/gatk_combinegvcfs.nf
gatk/gatk_filtermutectcalls_tumorOnly.nf
gatk/gatk_filtermutectcalls.nf
gatk/gatk_filtervarianttranches.nf
gatk/gatk_genotype_gvcf.nf
gatk/gatk_getsamplename_noMeta.nf
gatk/gatk_getsamplename.nf
gatk/gatk_haplotypecaller_sv_germline.nf
gatk/gatk_mergemutectstats.nf
gatk/gatk_mutect2_tumorOnly.nf
gatk/gatk_mutect2.nf
gatk/gatk_sortvcf_germline.nf
gatk/gatk_sortvcf_somatic_merge.nf
gatk/gatk_sortvcf_somatic_tools.nf
gatk/gatk_variantfiltration_af.nf
gatk/gatk_variantfiltration_mutect2.nf
gatk/gatk3_applyrecalibration.nf
gatk/gatk3_genotypegvcf.nf
gatk/gatk3_haplotypecaller.nf
gatk/gatk3_indelrealigner.nf
gatk/gatk3_realignertargetcreator.nf
gatk/gatk3_variantannotator.nf
gatk/gatk3_variantrecalibrator.nf
gridss/gridss_assemble.nf
gridss/gridss_calling.nf
gridss/gridss_chrom_filter.nf
gridss/gridss_preprocess.nf
gridss/gripss_somatic_filter.nf
homer/annotate_boolean_peaks.nf
homer/homer_annotatepeaks.nf
homer/plot_homer_annotatepeaks.nf
illumina/manta.nf
illumina/strelka2.nf
jaffa/jaffa.nf
kallisto/kallisto_insert_size.nf
kallisto/kallisto_quant.nf
lumpy_sv/lumpy_sv.nf
macs2/macs2_consensus.nf
macs2/macs2_peak_calling_chipseq.nf
macs2/plot_macs2_qc.nf
msisensor2/msisensor2_tumorOnly.nf
msisensor2/msisensor2.nf
multiqc/multiqc_custom_phantompeakqualtools.nf
novocraft/novosort.nf
nygc-short-alignment-marking/short_alignment_marking.nf
nygenome/lancet_confirm.nf
nygenome/lancet.nf
phantompeakqualtools/phantompeakqualtools.nf
picard/picard_cleansam.nf
picard/picard_collectmultiplemetrics.nf
picard/picard_collecttargetpcrmetrics.nf
picard/picard_fix_mate_information.nf
picard/picard_mergesamfiles.nf
pizzly/pizzly.nf
preseq/preseq.nf
primerclip/primerclip.nf
python/python_add_final_allele_counts.nf
python/python_add_nygc_allele_counts.nf
python/python_check_strandedness.nf
python/python_filter_pon.nf
python/python_filter_vcf.nf
python/python_germline_vcf_finalization.nf
python/python_get_candidates.nf
python/python_merge_columns.nf
python/python_merge_prep.nf
python/python_remove_contig.nf
python/python_rename_metadata.nf
python/python_rename_vcf.nf
python/python_reorder_vcf_columns.nf
python/python_snv_to_mnv_final_filter.nf
python/python_somatic_vcf_finalization.nf
python/python_split_mnv.nf
python/python_vcf_to_bed.nf
r/annotate_bicseq2_cnv.nf
r/annotate_genes_sv.nf
r/annotate_sv_with_cnv.nf
r/annotate_sv.nf
r/filter_bedpe.nf
r/frag_len_plot.nf
r/merge_sv.nf
samtools/samtools_faidx.nf
samtools/samtools_filter_unique_reads.nf
samtools/samtools_filter.nf
samtools/samtools_mergebam_filter.nf
samtools/samtools_stats_insertsize.nf
samtools/samtools_stats.nf
samtools/samtools_view.nf
squid/squid_annotate.nf
squid/squid_call.nf
star/star_align.nf
star-fusion/star-fusion.nf
subread/subread_feature_counts_chipseq.nf
svaba/svaba.nf
tabix/compress_merged_vcf.nf
tabix/compress_vcf_region.nf
tabix/compress_vcf.nf
ucsc/ucsc_bedgraphtobigwig.nf
utility_modules/aria_download.nf
utility_modules/chipseq_bampe_rm_orphan.nf
utility_modules/chipseq_check_design.nf
utility_modules/chipseq_make_genome_filter.nf
utility_modules/concatenate_reads_sampleSheet.nf
utility_modules/deseq2_qc.nf
utility_modules/frip_score.nf
utility_modules/get_read_length.nf
utility_modules/gunzip.nf
utility_modules/jax_trimmer.nf
utility_modules/parse_extracted_sv_table.nf
xenome/xenome.nf

Pipeline Changes:

WES, RNA-seq, and RNA-fusion added support for PDX data
WES, RNA-seq, WGS, ATAC, RRBS, ChIP added support for GRCm39
Support for input specification using sample sheets for ATAC, RNA-seq, RRBS, WES, WGS
Support for downloading input data for ATAC, RNA-seq, RRBS, WES, WGS
Added MULTIQC to ATAC, RNA-seq, RRBS, WES, WGS
Added assessment of strandedness using python/python_check_strandedness.nf rather than requiring specification via parameters
Added assessment of read length for RNAseq for STAR index selection rather than requiring specfication via parameters
Modified variant annotations in WES and WGS
Added GVCF support for WES and WGS

Module Changes:

errorStrategy modified for all modules to catch and report instances where tasks fail due to walltime or memory contraints. This previously required a deep reading of the subtask SLURM logs, but now will be reported in the top-level SLURM log and is more user-friendly
Removed log.info statements from modules to avoid noisy disruption of log files
ChIP-seq support for bwa/bwa_mem.nf, fastqc/fastqc.nf, picard/picard_markduplicates.nf, trim_galore/trim_galore.nf
Corrected emit statements for g2gtools/g2gtools_chain_convert_peak.nf
Corrected emit statements for gatk/gatk_chain_filter_reads.nf
Modified gatk/gatk_haplotypecaller_interval.nf and gatk/gatk_haplotypecaller.nf for optional GVCF support
Generalized multiqc/multiqc.nf via parameter for multiqc config
Removed --METRIC_ACCUMULATION_LEVEL ALL_READS and --VALIDATION_STRINGENCY LENIENT parameters from picard/picard_collectalignmentsummarymetrics.nf
Modified strand specification logic for picard/picard_collectrnaseqmetrics.nf
Updated rsem/rsem_alignment_expression.nf to reflect changes in strandedness detection, reorganized outputs and catching log files for multiqc
Changes to output text for mt DNA content in samtools/samtools_calc_mtdna_filter_chrm.nf
Changes to output text from samtools/samtools_final_calc_frip.nf
Changes to output formatting for samtools/samtools_quality_checks.nf
Updated snpEff container to v5.1d to support GRCm39
Changes to output fields for mouse and human from snpeff_snpsift/snpsift_extractfields.nf
Added missing container to utility_modules/concatenate_reads_PE.nf and utility_modules/concatenate_reads_SE.nf

Release 0.2.2

Change WES and WGS COMSIC annotation to use SNPsift.
Added explicit dbSNP annotation.

Pipelines Added:

NONE

Modules Added:

SNPSIFT_ANNOTATE

Pipeline Changes:

WES and WGS now use SNPSift to annotate COSMIC and dbSNP IDs onto variants.

Module Changes:

COSMIC_ANNOTATION and associated perl scripts removed.

Release 0.2.1

Added STAR support to RNA-seq pipeline.

Pipelines Added:

NONE

Modules Added:

NONE

Pipeline Changes:

RNA-seq pipeline now supports STAR and bowtie2 (default) through the RSEM module.

Module Changes:

RSEM: --rsem_aligner accepts "bowtie2" or "star." The default STAR indices for mouse and human are 100 bp, with alternates suggested in the RNA-seq config file.

Release 0.2.0

NOTE: This release contains a patch for multi-sample processing. We strongly recommend multi-sample processing done prior to this release should be re-run with v0.2.0+

Pipelines Added:

RRBS - Mouse & Human
ATAC - Mouse & Human

Modules Added:

FastQC
Trim-Galore
Bismark Alignment
Bismark Deduplicator
Bismark Methylation Extractor
MultiQC
Bedtools functions for ATAC QC summary
Bowtie2
Cutadapt
Deeptools bamcoverage and alignmentSieve
g2gTools chain convert
Macs2 ATAC peak calling and ATAC peak coverage
Subread feature counts

Pipeline Changes:

Multiple pipeline changes related to multi-sample patch.
Modified module load statements to invoke "${projectDir}" instead of relative "../" path.
Removed CTP and Probe coverage calculations from human RNA-seq

Module Changes:

Multiple module changes related to multi-sample patch.
Trimmomatic Trim stub module removed.
RSEM - forward stranded option added.
Picard Collect RNAseqMetrics - forward strand option added.

Release 0.1.2

Updated run scripts to load CS supported Nextflow module.

Release 0.1.1

Pipelines Added:

NONE

Modules Added:

concatenate_reads_PE.nf
concatenate_reads_SE.nf
Modules refactored to individual files (e.g., gatk_haplotypecaller.nf).

Pipeline Changes:

Added ability to concatenate Fastq files by sample, which are split across sequencing lanes into single R1/R2 or R1 files (depending on PE or SE).
Adjusted pipelines for refactored module files.
Fixed CTP/PROBE typo in human RNA coverage calculation.
Added HPC --profile options and settings for Sumner and Elion.

Module Changes:

Adjusted WGS wall clock settings.
Refactored modules to individual files (e.g., gatk_haplotypecaller.nf).
Set pipeline script parameter to hard coded paths.
Cleaned all Nextflow files from the bin directory.
Removed Sumner specific HPC settings from each module.

Release 0.1.0 -- 03.28.2022

Pipelines Added:

Whole Genome Sequencing - Mouse & Human
Whole Exome Sequencing - Mouse & Human
RNA Sequencing - Mouse & Human

Modules Added:

bamtools.nf
bcftools.nf
bwa.nf
cosmic.nf
gatk.nf
picard.nf
quality_stats.nf
read_groups.nf
rsem.nf
samtools.nf
snpeff.nf
snpsift.nf
summary_stats.nf
trimmomatic.nf

Pipeline Changes:

NONE

Module Changes:

NONE

Files

ReleaseNotes.md

Latest commit

History

ReleaseNotes.md

File metadata and controls

RELEASE NOTES

Release 0.7.4

Release 0.7.3

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Scripts Added:

Script Changes:

NF-Test Modules Added:

Release 0.7.2

Release 0.7.1

Release 0.7.0

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Scripts Added:

Script Changes:

NF-Test Modules Added:

Release 0.6.7

Release 0.6.6

Release 0.6.5

Release 0.6.4

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Script Changes:

Release 0.6.3

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Release 0.6.2

Release 0.6.1

Release 0.6.0

Pipelines Added:

Modules Added:

NF-Test Modules Added:

Pipeline Changes:

Module Changes:

Script Added:

Script Changes:

Release 0.5.0

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Script Added:

Script Changes:

Release 0.4.5

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Release 0.4.4

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Release 0.4.3

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Release 0.4.2

Pipelines Added:

Modules Added:

Pipeline Changes:

Module Changes:

Release 0.4.1

Pipelines Added:

Modules Added: