Using Machine Learning (ML) for cell lineage reconstruction

Renata Retkute (rr614@cam.ac.uk), Alidivinas Prusokas and Augustinas Prusokas

This is team AMbeRland submission to Allen Institute Cell Lineage Reconstruction DREAM Challenge https://www.synapse.org/#!Synapse:syn20692755/wiki/

The results have been published Gong et al. 2021, Cell Systems "Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees"

ML set up

We used Generalized Boosted Regression (GBR) model to predict the probabilities that two cells are sisters, and custom clustering algorithm, to partition cells into lineages based on these probabilities.

As a predictors, we have chosen the following features based on pairwise comparison of each recording array position:

number of recorded units which have not mutated;
number of recorded units which have the same mutation;
number of recorded units which have single mutation;
number of recorded units which have different mutations.

The status was set to 1 if two cells were sister cells and to 0 if observed two cells were not sister cells. We included only those pairs with status 0, that have not already been included with status 1. We assumed only binary trees.

Figure 1. AMbeRland approach for learning features from the DREAM Challenge training data.

All calculations were performed in R using package gbm https://cran.r-project.org/web/packages/gbm/index.html. Following options were used to train Generalized Boosted Regression model:

distribution = "bernoulli",
n.trees = 100,
interaction.depth = 1,
n.minobsinnode = 3,
cv.folds = 5.

Clustering

The clustering starts at the lowest tree level, where all possible pairs of cells are ranked according to the predicted probability that they are sisters. At each consecutive level, pairwise comparison is performed between each lower level cluster by calculating the maximum probability between any elements of two clusters. Pairs of clusters are ordered again according to this probability and are assumed to have the same parent-node if it’s value is above the estimated threshold for this level. This process was repeated until one or two clusters were left.

Figure 2. AMbeRland approach for clustetering cells based on predicted probabilities that cells are sister cells.

During the tree construction, the ground truth was used to evaluate a set of decreasing thresholds corresponding to how any two individual clusters of cells were related at different levels of the lineage tree.

Usage

To reproduce AMbeRland's submission for Subchallenge 1:

Rscript run.R

By default, the code will use fitted model. To fit the model based on training data, change:

  use.trained.model<-FALSE

To write results into spreadsheet, change:

  write.results.to.file<-TRUE

To explore influence of threshold values, run:

    Rscript thresholds.R

This will calculate distribution and mean of the Robinson-Foulds distance and the triplets distance for training data for a given threshold.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
test_data		test_data
train_data		train_data
AMbeRland_Clustering.png		AMbeRland_Clustering.png
AMbeRland_Method.png		AMbeRland_Method.png
README.md		README.md
fitted_model_for_DREAM_challenge.rds		fitted_model_for_DREAM_challenge.rds
infer.R		infer.R
run.R		run.R
thresholds.R		thresholds.R
thresholds_func.R		thresholds_func.R
train.R		train.R
train_setDREAM2019.txt		train_setDREAM2019.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Machine Learning (ML) for cell lineage reconstruction

ML set up

Clustering

Usage

About

Releases

Packages

Languages

rretkute/ml_cell_lineage_reconstruction

Folders and files

Latest commit

History

Repository files navigation

Using Machine Learning (ML) for cell lineage reconstruction

ML set up

Clustering

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages