GOR method and an SVM-based method for protein secondary structure prediction.
Detailed description of the methods and datasets can be found in the project report.
The implementations require numpy and scikit-learn packages.
pip install numpy
pip install scikit-learn
In order to play with the notebooks make sure to install Jupyter Notebook.
pip install notebook
git clone https://github.com/katarinaelez/protein-ss-pred
Pretrained model (model.npz) is available.
Prediction from the GOR method can be obtained using:
python gor-predict.py [-h] (--pssm PSSM | --fasta FASTA) filename_model
For example:
python src/gor-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.npz
Pretrained model (model.sav.tar.gz) is available.
Before it can be used it must be extracted in the following way:
tar -xzvf models/model.sav.tar.gz -C models/
Prediction from the SVM-based method can be obtained using:
python svm-predict.py [-h] (--pssm PSSM | --fasta FASTA) [--probs] filename_model
For example:
python src/svm-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.sav
GOR model can be trained using:
python gor-train.py [-h] [--filename_model FILENAME_MODEL]
[--window_size WINDOW_SIZE]
filename_id_list dir_pssm dir_dssp
For example:
python src/gor-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model
SVM model can be trained using:
python svm-train.py [-h] [--filename_model FILENAME_MODEL]
[--window_size WINDOW_SIZE]
filename_id_list dir_pssm dir_dssp
For example:
python src/svm-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model
GOR | SVM | |||
---|---|---|---|---|
CV | Blind test | CV | Blind test | |
SEN_H | 0.86±0.01 | 0.83 | 0.80±0.01 | 0.72 |
SEN_E | 0.62±0.01 | 0.60 | 0.58±0.01 | 0.62 |
SEN_C | 0.42±0.01 | 0.42 | 0.82±0.00 | 0.85 |
PPV_H | 0.58±0.01 | 0.60 | 0.82±0.01 | 0.85 |
PPV_E | 0.54±0.01 | 0.58 | 0.75±0.01 | 0.80 |
PPV_C | 0.80±0.01 | 0.73 | 0.72±0.00 | 0.65 |
MCC_H | 0.50±0.01 | 0.46 | 0.71±0.01 | 0.67 |
MCC_E | 0.45±0.01 | 0.46 | 0.58±0.01 | 0.63 |
MCC_C | 0.40±0.01 | 0.39 | 0.58±0.01 | 0.56 |
SOV_H | 65.48±0.99 | 62.70 | 76.39±0.97 | 68.64 |
SOV_E | 58.64±1.35 | 63.18 | 59.20±2.18 | 67.19 |
SOV_C | 43.09±0.69 | 45.57 | 70.18±0.93 | 70.93 |
ACC | 0.62±0.00 | 0.62 | 0.76±0.00 | 0.75 |
MIT @ Katarina Elez