Releases: vanheeringen-lab/seq2science
Release v0.7.1
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows.
Fixed
- issue with broad peaks and upsetplots
Release v0.7.0
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows.
Biggest change is that we revert back to snakemake 5.18 since higher versioned snakemake's cause instability.
Added
- upset plot as QC for peak calling. Should give a first feeling about the distribution of peaks between samples/conditions.
Changed
- downgraded the snakemake backend as snakemake 6+ is unstable for us.
Fixed
- corrupt environment creation with libreadline for edgeR normalization.
- subsampling causing a crash caused by bad syntax.
Release v0.6.1
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows.
Fixed
- corrupt environment creation with libcrypto in combination with strandedness rule
Release v0.6.0
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows.
Release 0.6.0 is a mix of bug fixes, small changes, and bigger stuff. Most importantly:
- added a deseq2science command to do differential expression analysis on user-supplied tables with seq2science settings
- for single-cell RNA-seq ADT-quantification is possible
- snakemake library updated, giving seq2science a new-ish look :)
The full changes are listed below:
Added
- added generic stats to the MultiQC report about the assembly, which might help pin point problems with the assembly used.
- added the slop parameter to the config.yaml of atac-seq and chip-seq workflows, just so they are more visible.
- added support for seurat object export and merging for kb workflow.
- added support for CITE-seq-count for ADT quantification
- added the option to downsample to a specific number of reads.
- new deseq2science command
Changed
- Seq2science now makes a separate blacklist file per blacklist option (encode & mitochondria), so that e.g. RNA-seq and ATAC-seq workflows can be run in parallel and don't conflict on the blacklist.
- error messages don't show the full traceback anymore, making it (hopefully) more clear what is going wrong.
- The effective genome size is now not calculated per sample, but per read length. When dealing with multiple samples (of similar) length this improves computational burden quite some.
- samtools environment updated to version 1.14
Fixed
- config option
slop
is now passed along to each rule using it - edge-case where local samples are in the cache, but not present in the fastq_dir
- bug with differential peak/gene expression across multiple assemblies
- bug with kb ref not creating index for non-velocity analysis
- bug with count import in read_kb_counts.R for technical replicates and meta-data handling
- deseq2 ordering in multiqc report
- issue with slop not being used for the final count table
- bug with onehot peaks not reporting the sample names as columns
Release v0.5.6
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Added
- MA plot, volcano plot, and PCA plots added to QC report for deseq2 related workflows
Changed
- updated salmon & tximeta versions
- colors for DESeq2 distance plots "fixed"
- updated bwa-mem2 version and reduced the expected memory usage of bwa-mem2 to 40GB
- seq2science now uses snakemake-minimal
Fixed
- stranded bigwigs are no longer inverted (forward containing reverse reads and vice-versa).
- fix in
rename_sample
preventing the inversion of R1 and R2 FASTQs. - bug with parsing cli for explanations
- show/hide buttons for treps are actually made for multiqc report
- fixes in deseq2/utils.R
- the samples.tsv will now work with only 2 columns
- the samples.tsv column names will be stripped of excess whitespace, similar to the config.
- ATAC-seq pipeline removing the final bams, keeping the unsorted one
Release v0.5.5
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Changed
- duplicate read marking happens before sieving and no reads get removed. Removal of duplicate reads now controlled with flag
remove_dups
in the config. - changed option
heatmap_deeptools_options
todeeptools_heatmap_options
- Updated sra tools and parallel fastq-dump versions
- Updated genomepy version
- Gene annotations are no longer gzipped and ungzipped. This should reduce rerunning.
Fixed
- rerunning being triggered too easily by input order
- issue with qc plots and broad peaks
- magic with prefetch not having the same output location on all machines
- issue with explain having duplicate lines
Release v0.5.4
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Added
- added support for kb-python kite workflow
Changed
- kb count output validation
- optional barcodefile argument for scRNA-seq workflow
- MultiQC updated to newest version
- updated kb-python version
Release v0.5.3
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Added
- DESeq2 blind sample distance & correlation cluster heatmaps for RNA-, ATAC- ChIP-seq counts
- find them annotated in the MultiQC when running >1 sample
Changed
- "biological_replicate" and "technical_replicate" renamed to *"_replicates" (matches between samples.tsv & config.yaml)
- fixed bug with seq2science making a {output.allsizes} file
- Changed explain to use 'passive style'
- Genrich peak calling defaults
- Doesn't remove PCR duplicates anymore (best to do with markduplicates)
- Changed extsize to 200 to be similar to macs settings
- Turned off tn5 shift, since that is done by seq2science
Fixed
- depend less on local genomes (only when data is unavailable online)
- trackhub explanation was missing, added
- bug with broad peaks and qc that could not be made
Release v0.5.2
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Added
- added rule for scRNA post-processing R Markdown for plate/droplet based scRNA protocols (experimental)
- added explanation for kb_seurat_pp rule
- heatmap of N random peaks to the multiqc report in the end
Fixed
- removed a warning of genome.fa.sizes already existing due to being already being downloaded beforehand (it's removed in between)
- genomepy's provider statuc checking not being used.
Release v0.5.1
Automated preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and RNA-seq workflows.
Added
- added CLI functionality to the deseq2.R script (try it with
Rscript /path/to/deseq2.R --help
!) - --force flag to seq2science init to automatically overwrite existing samples.tsv and config.yaml
- local fastqs with Illumina's '_100' are now recognized
- added the workflow explanation to the multiqc report
Changed
- config checks: all keys converted to lower case & duplicate keys throw an exception
- MultiQC updated to v1.10
- Link to seq2science log instead of snakemake log in final message
Fixed
- Issue when filtering a combination of single-end and paired-end reads on template length
- explain functionality testing
- scATAC can properly use SE fastqs
- scRNA can use fqexts other than R1/R2
- fastq renaming works again
- added missing schemas to extended docs
Fixed
- Bug with edgeR.upperquartile normalization. Now makes everything NaN, so pipeline finishes succesfully.