scChromHMM

scChromHMM provides a suite of tools for rapid processing of single-cell histone modification data to perform chromatin states analysis of the genome within each single-cell. It is an extention of bulk ChromHMM framework, which consumes the HMM model learned from ChromHMM and perform chromatin state analysis by running forward-backward algorithm for each single-cell.

Input Data

scChromHMM primarily requires a group of four kind of files, which are defined as follows:

fragment files: Fragment files contains the information about the mapping location of the sequencing read fragments on the genome. The basic format is similar to as described by 10x, and it's primarily a BED file with an additional information of cellular barcode for each mapped fragment. toy example: h3k27ac_fragments.tsv.gz
- NOTE the tabix index of the fragment files is also needed and can be generated using the command tabix -f -p bed <fragment_file.gz> for a block zipped (bgzip) fragment file. toy example: example/h3k27ac_fragments.tsv.gz.tbi
hmm_model: A tsv file containing the information about the hmm model parameters. The default schema of this file is similar to the one generated by ChromHMM. toy example: example/model_2.txt.
anchors: A tsv file with the list of anchors from the query data onto the reference data, along with their anchroring scores. toy example:example/k27ac.txt.
reference_cells: A list of all the cellular barcodes (one per line) present in the reference dataset. toy example:example/cells.txt

Compilation of the program

scChromHMM has been tested with stable release 1.52.1 of Rust, and the program can be compiled by using the command:

$ cargo build --release

Running scChromHMM

Once compiled the scChromHMM program can be run to generate the posterior probability distribution across the hidden states using the command:

$ target/release/schrom hmm -f <fragment_files> -m <hmm_model> -a <anchor_files> -c <reference_cells> -t <number_of_threads> -o <output_folder>

Note: The order of fragment files should be the same as the anchor files. A toy example can be run using the data present in the example folder using the following command: (An extra flag --onlyone has been added to run the toy example on a subsequence of chromosome 1).

RUST_BACKTRACE=full RUST_LOG="trace" /usr/bin/time target/release/schrom hmm -f example/h3k27ac_fragments.tsv.gz example/h3k27me3_fragments.tsv.gz example/h3k4me1_fragments.tsv.gz -m example/model_2.txt -a example/k27ac.txt example/k27me3.txt example/k4me1.txt -c example/cells.txt -t 10 -o output --onlyone

State-wise "short" representation

The hmm subcommand of the scChromHMM tool generates cell-wise posterior probabilities for every reference cell across the genome. The probabilities are stored for each cell in a binary format i.e. 200bp region by state matrix with integer values in range [0-100]. toy example: output/chr1/L1_CCTCTAGTCGCTAAAC.bin. Based on the number of reference cells, size of the output posterior probabilites can grow significantly; and some downstream analyses are faster to work with region by cells matrix (for each state) instead of region by state (for each cell) matrices. Hence, scChromHMM subcommand transform can be used to convert the data into the "short" representation of region by cell. The command to do that is as follows:

$ target/release/schrom transform -c <reference_cells> -i <input_folder> -o <output_folder>

The toy example can be run using the following command. NOTE An extra flag --onlyone has been added to run the toy example on a subsequence of chromosome 1.

$ mkdir short_output
$ RUST_BACKTRACE=full RUST_LOG="trace" /usr/bin/time target/release/schrom transform -c example/cells.txt -i output -o short_output --onlyone

Importing the posterior probabilities into R

The chromatin state wise, region by cells posterior probabilities of the toy example can be imported into the R environment using the following script:

library(Rcpp)
sourceCpp("src-R/parse.cpp")
mat <- get_state("short_output/chr1/1.bin", "chr1", "short_output/chr1/cells.txt")
dim(mat)
# [1] 5001 7201

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
example		example
src-R		src-R
src		src
test		test
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scChromHMM

Input Data

Compilation of the program

Running scChromHMM

State-wise "short" representation

Importing the posterior probabilities into R

About

Releases

Languages

License

satijalab/scChromHMM

Folders and files

Latest commit

History

Repository files navigation

scChromHMM

Input Data

Compilation of the program

Running scChromHMM

State-wise "short" representation

Importing the posterior probabilities into R

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages