Skip to content

Releases: ENCODE-DCC/chip-seq-pipeline2

v2.0.0

26 Oct 02:04
43c10a6
Compare
Choose a tag to compare

Upgrade Caper to the latest >=2.0.0. Old versions of Caper won't work correctly on HPCs.

$ pip install caper --upgrade
$ caper -v # check if >=2.0.0

Conda users must re-install pipeline's Conda environments. YOU DO NOT NEED TO ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING A PIPELINE. New Caper internally runs each task inside an installed Conda environment.

$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

HPC USERS MUST SPECIFY AN ENVIRONMENT TO RUN A PIPELINE ON. Choices are --conda, --singularity and --docker. This pipeline defaults to run with --docker so it will not work on HPCs without caper run ... --conda or caper run ... --singularity. It's recommended to use Singularity if your cluster supports it.

Please read new Caper (>=2.0.0)'s README carefully. There are very important updates on Caper's side for better HPC (Conda/Singularity/SLURM/...) support.

v1.9.0

11 May 16:26
6b70b8d
Compare
Choose a tag to compare

Conda users must update pipeline's environment

$ bash scripts/update_conda_env.sh

Added a new parameter to fix random seed for pseudoreplication.

  • This parameter controls random seed for shuffling reads in a TAG-ALIGN during pseudoreplication.
    • GNU shuf --random-source=some_hash_function(seed).
  • chip.pseudoreplication_random_seed: Any positive integer is allowed.
  • If 0 (default) then input TAG-ALIGN's file size (in bytes) is used for the random seed.

v1.8.1

20 Apr 08:11
6921fd6
Compare
Choose a tag to compare
  • Fixed issues on DNAnexus
    • Upgraded dxWDL to 1.50 (for pipelines >= 1.8.1)
    • Fixed broken genome TSV files on DNAnexus.
  • Fixed GNU sort memory issue.

v1.8.0

26 Mar 21:34
b4ffdfb
Compare
Choose a tag to compare

Conda users must update pipeline's environment.

$ bash scripts/update_conda_env.sh

Added input parameters:

  • chip.bowtie2_use_local_mode
    • If this flag is on then the pipeline will add --local to bowtie2 command line, which will override the default --end-to-end mode of bowtie2.
    • See details in this bowtie2 manual.
  • chip.bwa_mem_read_len_limit
    • This parameter is only valid if chip.use_bwa_mem_for_pe and FASTQs are paired-ended.
    • This parameter defaults to 70 (as mentioned in bwa's manual).
    • This parameter controls the threshold for read length of bwa mem for paired ended dataset. The pipeline automatically determines sample's read length from a (merged) FASTQ R1 file. If such read length is shorter than this threshold then pipeline automatically switches back to bwa aln instead of bwa mem. If you FASTQ's read length is < 70 and you still want to use bwa mem then try to reduce this parameter.
    • See details in this bwa manual.

Conda environment

  • Added and fixed version of tbb in the environment, which will possibly fix the bowtie2 and mamba conflicting library issue.

v1.7.1

23 Feb 20:15
2d51d34
Compare
Choose a tag to compare

Conda users must re-install Conda environment.

$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh mamba

mamba support for Conda environment installation

  • Add mamba to the installer command line to speed up resolving conflicts.
  • If it doesn't work then try without mamba.
  • mamba will be helpful for resolving conflicts of Conda packages much faster.

Increased resource factors

  • Increased factors for some heavy tasks (spr, filter, subsample_ctl and macs2_signal_track).
  • Increased fixed disk size for several tasks (gc_bias).

Others

  • Added version to meta.

v1.7.0

09 Feb 05:23
72360b4
Compare
Choose a tag to compare

Conda users must update their environment.

$ bash scripts/update_conda.env.sh

Added chip.redact_nodup_bam

  • This will redact filtered/nodup BAMs by replacing indels with reference sequences to protect donor's private information.

Added chip.trimmomatic_phred_score_format

Removed caper and croo from pipeline's Conda environment.

  • There has been some conflicts between conda-forge and bioconda packages. These two apps will be added back to the environment later after all conflicts are fixed.

v1.6.1

02 Nov 19:56
9461e6b
Compare
Choose a tag to compare

Conda users should re-install pipeline's environment.

$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

Bug fixes

  • Dependencies
    • py2 Conda environment
      • Fixed biopython at 1.76 which is the last version that supports py2.
    • py3 Conda environment
      • Added Caper's python dependency scikit-learn to it.
  • Malformed required memory for samtools sort command line.

starch support

  • Generate starch output for blacklist-filtered peaks. (.starch)
  • New Croo output definition JSON (v5) for starches.

v1.6.0

15 Sep 05:10
f3ab828
Compare
Choose a tag to compare

Conda users should update pipeline's environment. However, reinstalling is always recommended since we added GNU utils to the installer.

# To update env
$ bash scripts/update_conda_env.sh

# To re-install env
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

New factor-based resource parameters

  • New parameters are factor-based and those factors are multiplied to task's input file sizes to determine required resources (mem/disk) to run a task (on a cloud instance or as an HCP job).
  • e.g. for each replicate, sum of all R1/R2 FASTQs size will be used to determine resource for task align and BAM size will be used for task filter.
  • e.g. if you have total 20 GB (R1 + R2) of PE FASTQs and default chip.align_mem_factor is 0.15 and base memory is fixed at 4-6 GB for most tasks (5 GB for task align). So instance's memory for task align will be 20 * 0.15 + 5 = 8 GB
  • Also, optimized memory/disk requirements for each task, all tasks should use less memory/disk than previous versions.
  • Use SSD for all tasks on Google Cloud. This will cost x4 than HDD but it's still negligible (cost for SSD 100 GB is $0.5 per hour).

Change of default for resource parameters

  • chip.align_cpu: 2 -> 6
  • chip.filter_cpu: 2 -> 4
  • chip.call_peak_cpu: 1 -> 2 (peak-caller MACS2 is single-threaded. No more than 2 is required)

Added resource parameters

  • chip.spr_disk_factor
  • chip.preseq_disk_factor
  • chip.call_peak_cpu

Change of resource parameters.

  • chip.align_mem_mb -> chip.align_bowtie2_mem_factor and chip.align_bwa_mem_factor
    • According to chosen aligner chip.aligner (bowtie2 or bwa), For custom aligner, it will use chip.align_bwa_mem_factor.
  • chip.align_disks -> chip.align_bowtie2_disk_factor and chip.align_bwa_disk_factor
    • According to chosen aligner chip.aligner (bowtie2 or bwa), For custom aligner, it will use chip.align_bwa_disk_factor.
  • chip.filter_mem_mb -> chip.filter_mem_factor
  • chip.filter_disks -> chip.filter_disk_factor
  • chip.bam2ta_mem_mb -> chip.bam2ta_mem_factor
  • chip.bam2ta_disks -> chip.bam2ta_disk_factor
  • chip.xcor_mem_mb -> chip.xcor_mem_factor
  • chip.xcor_disks -> chip.xcor_disk_factor
  • chip.spr_mem_mb -> chip.spr_mem_factor
  • chip.spr_disks -> chip.spr_disk_factor
  • chip.jsd_mem_mb -> chip.jsd_mem_factor
  • chip.jsd_disks -> chip.jsd_disk_factor
  • chip.call_peak_mem_mb -> chip.call_peak_spp_mem_factor and chip.call_peak_macs2_mem_factor
    • According to chosen peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
  • chip.call_peak_disks -> chip.call_peak_spp_disk_factor and chip.call_peak_macs2_disk_factor
    • According to chosen peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
  • chip.macs2_signal_track_mem_mb -> chip.macs2_signal_track_mem_factor
  • chip.macs2_signal_track_disks -> chip.macs2_signal_track_disk_factor

Resources for task align

  • Custom aligner python script must be updated with --mem-gb.
    • Task align will use BWA's resources (chip.align_bwa_mem_factor and chip.align_bwa_disk_factor).
    • --mem-gb should be added to your Python script chip.custom_align_py.
    • See input documentation for details.

Resources for task call_peak

  • Different factor-based parameters will be used for different peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
  • If chip.peak_caller is not defined then TF ChIP-seq ("chip.pipeline_type": "tf") will default to use spp peak caller, hence chip.call_peak_spp_mem_factor and chip.call_peak_spp_disk_factor).
  • If chip.peak_caller is not defined then histone ChIP-seq ("chip.pipeline_type": "histone") will default to use macs2 peak caller, hence chip.call_peak_macs2_mem_factor and chip.call_peak_macs2_disk_factor).

Misc.

  • Better multi-threading samtools view/index/sort.
  • Added GNU utils to Conda environment.

Zenodo integration for citation purposes

10 Aug 19:41
b33c8c6
Compare
Choose a tag to compare

Integration with zenodo to generate a doi and citation that will update automatically with each subsequent release.

v1.5.1

22 Jul 18:51
b33c8c6
Compare
Choose a tag to compare

New resource parameter for control subsampling.

  • Control subsampling is separated from two peak-calling-related tasks (call_peak and macs2_signal_track) to prevent allocating high resource for subsampling, which is not fully utilized for peak-calling.
  • There is a new task for control subsampling, whose max. memory is controlled by chip.subsample_ctl_mem_mb.
    • It's 16000 by default.
    • Use higher number for huge controls. e.g. 32000 or 64000.

Bug fixes

  • Typo in documentation about parameter chip.mapq_thresh.
  • Syntax error in WDL's meta section, which is not caught by Womtool but caught by miniwdl.