Modeling Chromatin Insulator Loops with Long-Range Transformers

Abstract

Chromatin insulator loops play a critical role in regulating gene expression by physically isolating genes from nearby promoter regions. Mutations that undo loop formation can therefore increase the risk of disease. Previous approaches to modeling insulator loops have leveraged convolutional and recurrent neural network architectures. However, attention-based models, such as transformers, have demonstrated state of the art performance for many sequence modeling tasks, including in genomics and structural biology. We apply two transformer-based models, DNABERT and the Enformer, to identify anchor regions of chromatin loops. We find that these models are more effective at distinguishing true anchors from similar non-anchor regions.

Code Overview:

boundary_models.py: contains CNN and LSTM models/modules
boundary_transformers.py: contains transformer models
train_boundary.py: training loop for models
train_boundary_[MODEL].py: driver scripts for training different model types
evaluate.py: runs model on all test sets

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
DeepMILO Data Prep.ipynb		DeepMILO Data Prep.ipynb
Exploratory Data Analysis.ipynb		Exploratory Data Analysis.ipynb
README.md		README.md
all_archi_test1.png		all_archi_test1.png
anchor_model_cnn_lstm_old.ipynb		anchor_model_cnn_lstm_old.ipynb
anchor_model_with_testing.ipynb		anchor_model_with_testing.ipynb
boundary_dataset.py		boundary_dataset.py
boundary_models.py		boundary_models.py
boundary_transformers.py		boundary_transformers.py
dnabert_alltest.png		dnabert_alltest.png
enformer_alltest.png		enformer_alltest.png
environment.yml		environment.yml
evaluate-cnn.py		evaluate-cnn.py
evaluate-lstm.py		evaluate-lstm.py
evaluate.py		evaluate.py
figures_generator.ipynb		figures_generator.ipynb
train_boundary.py		train_boundary.py
train_boundary_bert.py		train_boundary_bert.py
train_boundary_cnn.py		train_boundary_cnn.py
train_boundary_cnn_lstm.py		train_boundary_cnn_lstm.py
train_boundary_dnabert.py		train_boundary_dnabert.py
train_boundary_enformer.py		train_boundary_enformer.py
train_boundary_lstm.py		train_boundary_lstm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modeling Chromatin Insulator Loops with Long-Range Transformers

Abstract

Code Overview:

About

Releases

Packages

Contributors 2

Languages

96koushikroy/ml4fg-project

Folders and files

Latest commit

History

Repository files navigation

Modeling Chromatin Insulator Loops with Long-Range Transformers

Abstract

Code Overview:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages