AMLproject

We are working on the AML pedDep project with DFCi & the BroadInsitute; BCH, looking at CRCs and their relation to genomic dependencies in the context of MLL/AML.

link to the first paper on the role of the CRC in childhood cancer: link to the second paper on : link to the third paper on :

Data

blacklist from https://github.com/Boyle-Lab/Blacklist/blob/master/lists/hg38-blacklist.v2.bed.gz

DBs

http://cistrome.org/db/#/

EOL1; DPF2 H3K4me1 H3K27ac BRD9 CTCF SMARCC1 H3K4me3 SMARCA4 BRD7 BRD9

MOLM-13; SMARCA4 BRD7 CTCF SMARCA4 DPF2 BRD9

SKNO-1 H3K27me3 H3K4me1 H3K4me3 ASXL2 H3K27ac

THP-1 H3K4me3 MLLT3 RUNX1 H3K79me2 H3K27ac

MV411 Pol2, BRD4, H3K79me2 Chips before/after inhibition of BRD4 DOT1L with SGC0946 IBET and SGC0946+IBET https://www.ncbi.nlm.nih.gov/sra?term=SRP111133

RNAseq data

slamseq iBet.. max data: slamseq RNAseq max data: slamseq iBet: muharpaper:

RNPv1

Date

IRF2BP2 IDT 12/18/19 IRF8 IDT 12/18/19 MEF2D IDT 12/18/19 MYC IDT 12/18/19 RUNX1 IDT 12/18/19 RUNX2 IDT 12/18/19 SPI1 IDT 12/18/19 ZMYND8 IDT 12/18/19 CDK6 IDT 12/18/19 BRD4 IDT 12/18/19 Non-Target IDT 12/18/19 LMO2 Synthego 4/9/19 LYL1 Synthego 4/9/19 MAX Synthego 4/9/19 ZEB2 Synthego 4/9/19 MEF2C Synthego 4/9/19 Non-Target Synthego 4/9/19 MEIS1 Synthego 6/7/19 FLI1 Synthego 6/17/19 ELF2 Synthego 6/17/19 GFI1 Synthego 6/17/19 IKZF1 Synthego 6/17/19 CEBPa Synthego 6/17/19 MYB Synthego 7/16/19

batch effect

MAX CEBPa ZEB2

Bio

MYC BRD4 MEF2D CDK6 IRF2BP2 RUNX1 RUNX2 ZMYND8 SPI1 GFI1 FLY1 MYB IKZF1 ELF2 CEBPa MEIS1 ZEB2 LMO2 MAX LYL1 MEF2C

diff expression

BRD4 CDK6 IRF2BP2 IRF8 MEF2D MYC RUNX1 RUNX2 SPI1 ZEB2 ZMYND8 CEBPa ZEB2 MYB MAX MEF2C LYL1 LMO2 IKZF1 FLI1 ELF2 GFI1 MEIS1

RNPv2:

others

Promoters: were selected from https://epd.epfl.ch/get_promoters.php slamseq paper: ATACseq: https://www.ncbi.nlm.nih.gov/sra?term=SRX5608489 depmap: known enhancer: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467550/ adjacency: ABC assignements: superenhancer matrix: Fish SuperRes: gs://amlproject/super_res/

Motifs:

name | types

For data access:

Map for the amlproject bucket (gs://amlproject)

_to get access to this folder you need to ask jeremie kalfon jkobject@gmail.com _

you can find how to access this easily here (you need to be okay with command line) Else you can copy and paste this url https://console.cloud.google.com/storage/browser/ and replace the bucket name with the current bucket

RNA
- IRF2BP2: fastqs for the project
- RNP: fastqs for the project
Chip
- IRF2BP2_degraded_rep1
  - bwa/MergedLibrary library-level, coordinate sorted *.bam files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be *.mLb.clN.*.
    - samtools_stats/ SAMtools
      - *.flagstat,
      - *.idxstats and
      - *.stats files generated from the alignment files.
    - picard_metrics/ Alignment QC files from picard CollectMultipleMetrics and the metrics file from MarkDuplicates:
      - *_metrics and
      - *.metrics.txt, respectively.
      - pdf/ Alignment QC plot files in *.pdf format from picard CollectMultipleMetrics.
    - preseq/ Preseq expected future yield file (*.curve.txt).
    - bigwigs Normalised *.bigWig files scaled to 1 million mapped reads.
    - macs/<PEAK_TYPE>/ MACS2 output files:
      - *.xls,
      - *.broadPeak or *.narrowPeak,
      - *.gappedPeak and
      - *summits.bed. The files generated will depend on whether MACS2 has been run in narrowPeak or broadPeak mode. HOMER peak-to-gene annotation file:
      - *.annotatePeaks.txt.
      - qc/ QC plots for MACS2 peaks: macs_peak.plots.pdf QC plots for peak-to-gene feature annotation: macs_annotatePeaks.plots.pdf MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios:
        
        *.FRiP_mqc.tsv,
        
        *.count_mqc.tsv and
        
        macs_annotatePeaks.summary_mqc.tsv respectively.
      - consensus/ Consensus peak-set across all samples in
        
        *.bed format. Consensus peak-set across all samples in
        
        *.saf format. Required by featureCounts for read quantification. HOMER
        
        *.annotatePeaks.txt peak-to-gene annotation file for consensus peaks. Spreadsheet representation of consensus peak-set across samples with gene annotation columns:
        
        *.boolean.annotatePeaks.txt. The columns from individual peak files are included in this file along with the ability to filter peaks based on their presence or absence in multiple replicates/conditions. Spreadsheet representation of consensus peak-set across samples without gene annotation columns:
        
        *.boolean.txt. Same as file above but without annotation columns. UpSetR files to illustrate peak intersection:
        
        *.boolean.intersect.plot.pdf and
        
        *.boolean.intersect.txt.
        
        /deseq2/
        
        .featureCounts.txt file for read counts across all samples relative to consensus peak-set. Differential binding
        
        *.results.txt spreadsheet containing results across all consensus peaks and all comparisons.
        
        *.plots.pdf file for PCA and hierarchical clustering.
        
        *.log file with information for number of differentially bound intervals at different FDR and fold-change thresholds for each comparison.
        
        *.dds.rld.RData file containing R dds and rld objects generated by DESeq2. R_sessionInfo.log file containing information about R, the OS and attached or loaded packages.
        
        //
        
        *.results.txt spreadsheet containing comparison-specific DESeq2 output for differential binding results across all peaks. Subset of above file for peaks that pass FDR <= 0.01 (*FDR0.01.results.txt) and FDR <= 0.05 (*FDR0.05.results.txt). BED files for peaks that pass FDR <= 0.01 (*FDR0.01.results.bed) and FDR <= 0.05 (*FDR0.05.results.bed). MA, Volcano, clustering and scatterplots at FDR <= 0.01 and FDR <= 0.05:
        
        *deseq2.plots.pdf.
        
        /sizeFactors/ Files containing DESeq2 sizeFactors per sample:
        
        *.txt and
        
        *.RData.
    - phantompeakqualtools Output files:
      - *.spp.out,
      - *.spp.pdf. MultiQC custom content files:
      - *_spp_correlation_mqc.tsv,
      - *_spp_nsc_mqc.tsv,
      - *_spp_rsc_mqc.tsv.
    - deepTools
      - plotFingerprint Output files:
        
        *.plotFingerprint.pdf,
        
        *.plotFingerprint.qcmetrics.txt,
        
        *.plotFingerprint.raw.txt
      - plotProfile Output files:
        
        *.computeMatrix.mat.gz,
        
        *.computeMatrix.vals.mat.gz,
        
        *.plotProfile.pdf,
        
        *.plotProfile.tab.
  - droso_aligned reads mapped to the reference drosophilia genome
  - recalib_bigwig spikedIn control bigwigs
  - fastqs initial fastq files received (renamed)
  - fastqc/ FastQC *.html files for read 1 (and read2 if paired-end) before adapter trimming.
    - zips/ FastQC *.zip files for read 1 (and read2 if paired-end) before adapter trimming.
  - igv/<PEAK_TYPE>/ igv_session.xml file. igv_files.txt file containing a listing of the files used to create the IGV session, and their allocated colours.
  - multiqc/<PEAK_TYPE>/ multiqc_report.html - a standalone HTML file that can be viewed in your web browser.
    - multiqc_data/ - directory containing parsed statistics from the different tools used in the pipeline.
    - multiqc_plots/ - directory containing static images from the report in various formats.
  - pipeline_info
  - reference_genome A number of genome-specific files are generated by the pipeline in order to aid in the filtering of the data, and because they are required by standard tools such as BEDTools. These can be found in this directory along with the genome fasta file which is required by IGV.
  - trim_galore/ FastQ files after adapter trimming will be placed in this directory.
    - logs/ *.log files generated by Trim Galore!.
    - fastqc/ FastQC *.html files for read 1 (and read2 if paired-end) after adapter trimming.
      - /zips/ FastQC *.zip files for read 1 (and read2 if paired-end) after adapter trimming.
  - pipeline_info/ Reports generated by the pipeline - pipeline_report.html, pipeline_report.txt and software_versions.csv. Reports generated by Nextflow - execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.svg. Reformatted design files used as input to the pipeline - design_reads.csv and design_controls.csv.
  - Documentation/ Documentation for interpretation of results in HTML format - results_description.html.
- IRF2BP2_degraded_rep2
  - **
- IRF2BP2_degraded_hist
  - **

Code

terra's RNAseq pipeline: nextflow's ChIPseq: nextflow's ATACseq:

Code additionals:

JKBio:

JKBio's epigenetics/ChIP_helper.py:

and ccle_processing:

AMLproject code:

notebooks/ [PROJECT].ipynb

html_notebooks/ [PROJECT]_v[X].html

data/ [PROJECT]/ subfolders/ data data

results/ <- saved into google drive (link: ) [PROJECT]/ plots/ [PLOT_NAME]_[changetype]_v[X].png|pdf|html <- each pdfHTML plot needs to have a corresponding png|pdf data/ [Name]_[changetype]_v[X].[...]

subproject

Cell lines WGS analysis
ChIP_analysis:
- v1:
- v2:
creating_intervals_CRISPRtargets
DeepBind_Analysis
Fish_SuperRes
IRF2BP2 ChIPs Analysis
JQ1_RNA_analysis:
- v1
- v2
JQ1_slamRNA_analysis
- v1
processing ChipSeq
Processing_IRF2BP2_degraded_MV411
RNA_RNP_analysis
- v1
- v2
- v2_scaled
RNA-AMLproject
running OPTICS
slamseq IRF2BP2 degraded
- v1
- v2
slamseq_MYCpaper
- v2_K562
- v2_MOLM13

important data

important results

source codes

ROSE:
ROSEv2:
ChromHMM:
Ken's code:
- code for the ABC mapping
- code for the inference of CRC members from RNAseq
Moe's code
- code for the plots
- code for the cell line/tumors relationship
Code to create the super enhancer matrix
Andrew's code
- Andrew's presentation
Neekesh's code to create the first CRC list

last version of the cobinding matrix preprocessing notebook

For this version I wanted to change the initial notebook (please see previous commits from mid-2021 to see the previous ones).

The goal of this new version is mainly to change how the merging of each chipseq is done. The replicate merging stays the same function but a different set of badquality samples. But for merging all into the cobinding matrix, instead of just doing an merge if >N peak overlaps, I wanted to do something more complex.

Here I used mainly the scATACseq to define open regions. postulating that any peaks outside of ATAC peak is a False Positive except if multiple TFs have peaks defined there.

Moreover, knowing that ATACseq is more precise than ChIPseq, we are using the ATAC peak location as the base, not extending them if a ChIP peak falls outside.

It seems that this did not dramatically changed the results but I did not have the time to run all the analysis and thus do not know for sure (see cobinding_v4)

TODO:

packages for ccle_processing
packages for JKBio
packages for AMLproject
add Ken's code to AMLproject

Name		Name	Last commit message	Last commit date
Latest commit History 296 Commits
.github/workflows		.github/workflows
.nextflow		.nextflow
CREME		CREME
data		data
docs		docs
documents		documents
html_notebooks		html_notebooks
meme_out		meme_out
nextflow		nextflow
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
igv-webapp-session.json		igv-webapp-session.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMLproject

Data

DBs

RNAseq data

RNPv1

Date

batch effect

Bio

diff expression

RNPv2:

others

Motifs:

name | types

For data access:

Map for the amlproject bucket (gs://amlproject)

Code

Code additionals:

AMLproject code:

subproject

important data

important results

source codes

last version of the cobinding matrix preprocessing notebook

TODO:

About

Releases

Packages

Contributors 2

Languages

License

jkobject/AMLproject

Folders and files

Latest commit

History

Repository files navigation

AMLproject

Data

DBs

RNAseq data

RNPv1

Date

batch effect

Bio

diff expression

RNPv2:

others

Motifs:

name | types

For data access:

Map for the amlproject bucket (gs://amlproject)

Code

Code additionals:

AMLproject code:

subproject

important data

important results

source codes

last version of the cobinding matrix preprocessing notebook

TODO:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages