- R 3.4.0 or above
- Python 3.6.3 or above
- Python TensorFlow package
- Python numpy package
- Python pandas package
Tip: once you install python TensorFlow, you can simply install all other required python packages by
pip install -r requirements.txt
.
Then you download the Ecomcis transcriptome data from here and place it in Dataset folder.
This step will preprocess the original dataset in the format that MOMA can train. For this, type
Rscript preprocess_dataset.R Dataset/Ecomics.transcriptome.no_avg.v8.txt Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt
Note that the code reads information from Dataset/Meta.txt
, Dataset/Meta.Medium.txt
, Dataset/Meta.Strain.txt
. This will save the preprocessed dataset in the file Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt
Then you can run MOMA (prediction of transcriptomic response from characteristics of experimental condition) by
python3 run_moma.py Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt Dataset/GRN.txt OPTIMIZATION_METHOD CONDITION_INDEX_TO_TEST
Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt
is the dataset to be used for leave-one-condition-out cross-validation.Dataset/GRN.txt
is the list of gene-regulatory relations (gene-regulatory network). This information is used to regulate the recurrent weight matrix. That is, we constrain the weight matrix not to have nonzero weights on any connections between genes that are not in the gene-regulatory network.OPTIMIZATION_METHOD
can be SGD or RMSProp. To speed up the model training, RMSprop is recommended forOPTIMIZATION_METHOD
.CONDITION_INDEX_TO_TEST
is the index of condition (that is, a row index, ranging from 0 to 492 as there are 493 conditions or 493 rows in theDataset/Ecomics.transcriptome_with_meta.avg.v8.txt
) to test its prediction from the model that is built on the rest of conditions (Leave-one-condition-out cross-validation; refer to Kim et al. Nature commms 2016 for more information).
Please note the following before running the model:
- Note that some of test conditions will not produce prediction results if the conditions are cross-validatable (for example, strain of the test condition is JM109 but this strain is not in the training data).
- The prediction results will display in the console in PCC metric (that is, PCC between predicted expression levels and known expression levels for the test condition) in comparison to the wildtype baseline (that is, PCC between mean expression levels of wildtype profiles and known expression levels for the test condition).