The cell state transition is a common phenomenon observed during embryogenesis, wound healing, cancer metastasis. The state of a cell can be defined based on either molecular features like gene expression, protein expression, or functional features like change in morphology, migration potential of cells. In this project, we have developed a mathematical model to study cell state transition dynamics.
The dynamics of cell state transition can be estimated through continuous monitoring of cells using live-cell imaging. This experimental setup is costly and time-consuming. On the other hand, collecting cell population data at discrete time-points is much easier and cheaper. Using computational tools, we can decipher the dynamics of state transition from these data. Our model estimates the fraction of cells moving from one state to another state from discrete-time population aggregate data.
The model performs piece-wise data fitting (i.e.) the unknown parameters are estimated for each time interval (t, t+Δt). Our model considers the transition of cells from one state to another state, cell division, and cell death.
The complete model is developed in MATLAB and is easy to implement. This model is implemented in the study, Morphological State Transition Dynamics in EGF-Induced Epithelial to Mesenchymal Transition. The detailed information about the model and the mathematical equations are available in the supplementary material of the article.
Devaraj V., Bose B. Morphological State Transition Dynamics in EGF-induced Epithelial to Mesenchymal Transition. Journal of clinical medicine. doi: 10.3390/jcm8070911.
The estimation of model parameters involves two steps. First, we estimate the fraction of dividing cells in each state, and then we estimate the fraction of cells moving from one state to another state.
The fraction of diving cells in each state is estimated for each time interval. The unknown parameters are estimated by linear least square method.
The fractional flow of cells from one state to another state is estimated for each time interval. To avoid overfitting of data, we used two objective functions in the parameter estimation. Objective function 1, minimizes the sum of square error between the observed data and the estimated data. Objective function 2, minimizes the difference between L1-norm of fractional state transition parameters of two consecutive time intervals. Using these two objective functions, we estimate the unknown parameters simultaneously for all time intervals. We implemented this optimization strategy using multiobjective genetic algorithm.
The optimization steps are repeated multiple times to avoid local minima. Each run of the optimization is independent of the other. Therefore, this part of the model is executed in parallel using parallel processing in MATLAB.
If there are n different time points and m different cell states and p replicates of the experiment, then the input data should follow the below structure,
conditions | time_1 | time_1 | time_1 | ... | time_n | time_n | time_n |
---|---|---|---|---|---|---|---|
replicate_1 | cell_fract_1 | ... | cell_fract_m-1 | ... | cell_fract_1 | ... | cell_fract_m-1 |
... | ... | ... | ... | ... | ... | ... | ... |
replicate_p | ... | ... | ... | ... | ... | ... | ... |
The data in each column are the fractions of different cell states. There should not be any headers in the actual input excel sheet. The data should follow the same structure, as shown in the above table. Here, m-1 denotes the number of cell states excluding dead cell state.
Detailed information about the model structure and the underlying assumptions are available in the supplementary material of the article.
Fold change in the total cell population (live+dead) is calculated for every consecutive time interval. If there are n different observed time points, then the number of time intervals will be n-1. The input data should follow the below structure,
conditions | time_1 | ... | time_n-1 |
---|---|---|---|
replicate_1 | fold_change | ... | fold_change |
... | ... | ... | ... |
replicate_p | ... | ... | ... |
The data in each column are the fold change in the total cell population. There should not be any headers in the actual input excel sheet. The data should follow the same structure, as shown in the above table.
The model estimates fractional cell division from the above two input data. The estimated fractional cell division values should be converted to an excel sheet without any headers. This data serves as the input for the second part of the model, where fractional state transition parameters are estimated.
The MATLAB code to estimate the fraction of cell division for each time interval is available here. The input parameters of the model are defined in the main.m
file. This module requires the following input details:
popFraction
reads the fraction of cell population at the observed time points from the excel sheetFractionCellType.xlsx
. The excel sheet should not contain any row or column headers.foldChange
reads the fold change in cell population from the excel sheetFoldChange.xlsx
. The excel sheet should not contain any row or column headers.numOfUnk
reads the number of unknown parameters to be estimated. It is the product of the number of cell states (excluding dead cell state) and the number of observed time intervals. For example, If there are three cell states observed at five discrete time intervals,numOfUnk
=15.numCellState
reads the number of different cell states (excluding dead cell state).
Download the code and place the input excel sheets in the same location of the downloaded code. Open the main.m
file in MATLAB and enter the input details. Now, run the main.m
file. Once the optimization is completed, the following results are exported from the model:
- The estimated fractional cell division parameters are exported to a tab-delimited text file,
fractionalCellDivision.txt
. Each row represents the fractional cell division of each cell state, and the columns represent the fractional cell division in each time interval.
The MATLAB code to estimate the fraction of cell state transition for each time interval is available here. The folder contains multiple MATLAB files. A short description of each MATLAB function is provided in the header section of the corresponding MATLAB file. The input parameters of the model are defined in the main.m
file. This module requires the following input details:
popFract
reads the fraction of cell population at the observed time points from the excel sheetFractionCellType.xlsx
. The excel sheet should not contain any row or column headers.foldChange
reads the fold change in cell population from the excel sheetFoldChange.xlsx
. The excel sheet should not contain any row or column headers.cellDiv
reads the fractional cell division from the excel sheetCellDivisionFraction.xlsx
. The excel sheet should not contain any row or column headers.numOfUnknown
reads the number of unknown parameters to be estimated. It is the product of the square of the number of cell states (excluding dead cell state) and the number of observed time intervals. For example, If there are three cell states observed at five discrete time intervals,numOfUnknown
=45.numCellState
reads the number of different cell states (excluding dead cell state).numOptRun
reads the number of independent optimization runs.
Download all the .m files. Place the input excel sheets in the same location of the downloaded MATLAB files. Open the main.m
file in MATLAB and enter the input details. Enter the location of all downloaded MATLAB files in the main.m
file using addpath
. For example, If the MATLAB files are in the location, C:\xxx\yyy\
then include addpath('C:\xxx\yyy\')
in the beginning of the main.m
file.
Configurations of the parameters of the genetic algorithm are defined in initialise.m
. If required, the user can edit the parameters in initialise.m
. The description of each parameter is available in mathworks.
Now, run the main.m
file. The optimization runs in parallel, depending on the number of independent optimizations defined by the user. A Pareto front will be estimated for each independent optimization, and the best solution for each Pareto front will be computed using knee point. A subfolder will be created for each independent optimizations in the working directory, and the results are exported to the corresponding subfolders.
The following are exported to subfolders of each optimization runs:
- Pareto front of the multiobjective optimization is exported to a tab-delimited text file,
paretoFront.txt
. The first column represents the objective function 1 and the second column represents the objective function 2. - Pareto front of the multiobjective optimization is exported to a jpg file,
paretoFront.jpg
. bestObjFun.txt
is the best of all solutions in the Pareto front. The first column represents objective function 1, and the second column represents objective function 2.- State transition parameters of the best solution in the Pareto front is exported to a tab-delimited text file,
bestConvergedFract.txt
. The data are exported without row/column headers in a single 2D matrix with the following structure,
time_1 | cell_state_1 | ... | cell_state_m-1 |
---|---|---|---|
cell_state_1 | ... | ... | ... |
... | ... | ... | ... |
cell_state_m-1 | ... | ... | ... |
time_n-1 | cell_state_1 | ... | cell_state_m-1 |
---|---|---|---|
cell_state_1 | ... | ... | ... |
... | ... | ... | ... |
cell_state_m-1 | ... | ... | ... |
Once all the independent optimizations are completed, the best solution from each optimization run will be estimated and exported to a tab-delimited text file, bestOfEachRun.txt
. The solution with minimum objective function 1 is considered the optimal solution. The details about the optimal solution are exported to a tab-delimited text file, summary.txt
.
We have given a sample test data from the study, Morphological State Transition Dynamics in EGF-Induced Epithelial to Mesenchymal Transition. In this case, there are three live-cell states and a dead cell state. The data were collected at six different time points in triplicates.
The sample data to estimate fractional cell division are available here and the sample output files are available here.
The sample data to estimate fractional state transition parameters are available here and the sample output files are available here. The input parameters used to generate the outputs were, numOptRun=10
, st.optim.config.PopulationSize=250
and st.optim.config.Generations=250
. Whenever the model is executed with the given test data and settings, the numerical results might not be the same as given in the sample output. Rather, the trend of the data would be similar.
- Vimalathithan Devaraj
- Biplab Bose
Usage and redistribution of the source codes of this MATLAB package with or without modification are permitted.
The MATLAB function knee_pt.m
used in this model was developed by Kaplan, Dmitry (2012). Knee Point, MATLAB Central File Exchange. Copyright (c) 2012, Dmitry Kaplan. All rights reserved.