Chromatin insulator loops play a critical role in regulating gene expression by physically isolating genes from nearby promoter regions. Mutations that undo loop formation can therefore increase the risk of disease. Previous approaches to modeling insulator loops have leveraged convolutional and recurrent neural network architectures. However, attention-based models, such as transformers, have demonstrated state of the art performance for many sequence modeling tasks, including in genomics and structural biology. We apply two transformer-based models, DNABERT and the Enformer, to identify anchor regions of chromatin loops. We find that these models are more effective at distinguishing true anchors from similar non-anchor regions.
boundary_models.py
: contains CNN and LSTM models/modulesboundary_transformers.py
: contains transformer modelstrain_boundary.py
: training loop for modelstrain_boundary_[MODEL].py
: driver scripts for training different model typesevaluate.py
: runs model on all test sets