Skip to content

Latest commit

 

History

History
145 lines (95 loc) · 8.17 KB

tools.md

File metadata and controls

145 lines (95 loc) · 8.17 KB
title description category subcategory tags
Tools related to ChIP-seq and related analysis
This page shows tools that can be applied to ChIP-seq analysis and other chromatin biology related techniques.
research
chipseq
tools
literature

Tools_tested

Alignment

The most commonly used aligner for ChIP-seq is Bowtie2. However, recently we have tried runing bwa which has resulted in higher mapping rates (~ 2%), with an equally similar increase in the number of duplicate mappings identified. Post-filtering this translates to a significantly higher number of mapped reads and results in a much larger number of peaks being called (30% increase). When we compare the peak calls generated from the different aligners, the bwa peak calls are a superset of those called from the Bowtie2 aligments. Whether or not these additional peaks are true positives, is something that is yet to be determined.

  • bwa
  • Rory Kirchner, tested in early 2018
    • version# ?
    • information about parameters used?
    • TODO: find a good benchmarking dataset to compare bwa and

Peak calling

  • MACS2

    • this is what we currently use in bcbio. Has both narrow and broad peak functionality and is used for ATAC-seq peak calling.
  • SPP

    • Meeta Mistry, tested/used last in 2017 - R 3.2.1 - not sure if it works well for broad peaks - had trouble getting it to work with more recent R versions

QC

  • ChIPQC

    • currently being used in most consults
    • the report is easy to generate (two lines of code)
    • Run locally with O2 mount or Run on O2?? - locally can be computationally intensive since processing BAM files
      • getting all dependency packages installed on the cluster is a bit of an issue
      • Solution: Can we have it run as part of bcbio?
  • phantompeakqualtools

    • Meeta Mistry, last used in 2018
    • not as relevant for braod peak calling
    • no longer used (or presented in teaching materials) since some of these metrics are covered in ChIPQC

Handling replicates

  • bedtools

    • standard, quick way of checking overlapping peaks between replicates
    • min 1bp overlap
    • no statistical significance
  • IDR (Irreproducibility Discovery Rate)

    • a rank-based method of evaulating concordance between peak calls. Takes the list of overlapping peaks (1bp overlap) and statistically evaluates when the concordance drops off. IDR values are assigned to each peak, which can be interepreted similar to an FDR
    • part of the ENCODE guidelines but not sure which version of the tool is best to use

Visualization

  • deepTools
    • A suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.
    • This tool is used to generate high resolution quality figures for publication. Command-line tool available on O2 (although not the latest version),particuarly helpful in making custom profile plots and heatmaps.

Differential Enrichment

  • DiffBind
    • an R Bioconductor package used for differential enrichment analysis. Similar input as ChIPQC, as it processes BAM files to obtain read density values for each peak in each sample.
    • can output count matrix and input to DESeq2 for more complex linear models

Functional analysis and Annotation

  • ChIPseeker

    • R Bioconductor package for peak annotation and visualization. Currently, in use for most 2018 consults onwards.
    • Nearest gene annotation, uses the TxDb databases.
    • Visualization is based on peaks, not read density - so not very accurate.
    • Target gene lists can be used directly as input to clusterProfiler for functional analysis
  • HOMER

    • Meeta Mistry, used this for two big ChIP-seq consults (Harwell and Flanagan). Last used in 2017.
    • Good for peak annotation and can also do some level of visualization (generated the underlying data which can be loaded into R for plotting).
    • Internally uses RefSeq by default, but can provide a GTF file for custom annotation.
    • No functional analysis, but is useful for motif analysis.

Tools_novel

  • csaw

    • Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
  • SICER's reimplementation: epic2

    • Link to the paper
    • It might be worth taking some time to investigate peak callers designed specifically for broad marks. We default to MACS2 --broad but depedning on the histone mark have had trouble finding peaks.
  • SUPERmerge

    • broad peak caller; especially useful for low sample sizes
  • haystack_bio

    • An analysis pipeline from the Pinello lab. It can be used with histone modifications and chromatin accessibility data generated by ChIP-seq, DNase-Seq, and ATAC-seq assays across multiple cell-types. In addition, it is also possible to integrate gene expression data generated by RNA-seq for example.
  • TF and Histone ChIP-seq processing pipeline

    • from the Kundaje lab

Links to Read.