User Guide:

Installation
Walkthrough of using Codon2Vec

Background:

Codon2Vec runs on the command-line and is compatible with both Windows and Unix operating systems.

A. First time setup instructions

Download python3 (version 3.7 or higher) https://www.python.org/downloads/. Ensure that python is added to your operating system's path.
Download the Codon2Vec repository here: https://github.com/rhondene/Codon2Vec/tree/main/Codon2Vec
Unzip the Codon2Vec folder, and open a terminal window in the uncompressed Codon2Vec folder.
Here you will do a one-time installation of the dependencies Codon2Vec needs to run. On the terminal window type the following command:

       python setup.py install

Exit the Codon2Vec folder Installation is now completed.

B. Walkthrough of using Codon2Vec

1. Example input files

Codon2Vec takes a fasta file of coding sequences and an expression table that is either comma-separated or tab-separated. Below are guidelines for how the input files should be formatted:

a. fasta format:

b. Expression table:

Important guidelines

Ensure that the sequence IDs in the fasta file are identical to the sequence IDs in the expression table. However, the fasta and expression table doesn't have to be in the same order or contain the same number of genes .

For the fasta file, the program expects that sequence ID immediately follows the fish fin '>' .

For the expression table, ensure that the first two columns contain the sequence ID and expression values.

2. Running Codon2Vec on the command-line

Open a terminal in the working folder containing the input files and Codon2Vec package folder like so:

To see all the available options that modifies the model training, type

       python ./Codon2Vec/ --help

To run Codon2Vec with default options, type this command on the terminal:

python ./Codon2Vec -CDS some_input.fasta -exprs some_exprs.csv -outfolder results

Recommendation: Machine learning is an iterative process and the model may converge on a local optima that is not necessarily the best optima. ( How Neural Networks Learn ). So perform model training multiple times to choose the model with best parameterization.

Setting Seed for Reproducibility: Neural networks are stochastic algorithms by design so the training the same model on the same data yields different results. To improve the stability of results, use the console -seed_num option.

3. Output:

Successfully running this program writes the evaluation metrics of the model performance on the hold-out test set to the standard output and a text file.

Please see the Methods section of the original manuscript that explains each evaluation metric.

Model Performance Figures: The program also outputs summary figures of model evaluation such as a confusion matrix and a learning curve that compares the model accuracy during training vs validation. Learning curves and confusion matrices are widely used in machine learning to diagnose overfitting or underfitting. ( see this informative blog post ).

3. Predictions on new sequences

You have trained your model and are pleased with the model's predictive performance. Now you would like to use the saved model to make predictions on new sequences. To do so, type the following command in your terminal:

python ./Codon2Vec/predict.py -model your_trained_model -fasta new_seqs.fasta -out name_of_output

Because of the slightly stochastic nature of the predict() function. I advise that you run the predictions multiple times (at least 10 times) and take the mean or median of the prediction probabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How-to-use.md

How-to-use.md

User Guide:

Background:

A. First time setup instructions

B. Walkthrough of using Codon2Vec

1. Example input files

2. Running Codon2Vec on the command-line

3. Output:

3. Predictions on new sequences

Files

How-to-use.md

Latest commit

History

How-to-use.md

File metadata and controls

User Guide:

Background:

A. First time setup instructions

B. Walkthrough of using Codon2Vec

1. Example input files

2. Running Codon2Vec on the command-line

3. Output:

3. Predictions on new sequences