Skip to content

katarinaelez/protein-ss-pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protein-ss-pred

GOR method and an SVM-based method for protein secondary structure prediction.
Detailed description of the methods and datasets can be found in the project report.

Getting started

Requirements

The implementations require numpy and scikit-learn packages.

pip install numpy
pip install scikit-learn

In order to play with the notebooks make sure to install Jupyter Notebook.

pip install notebook

Installation

git clone https://github.com/katarinaelez/protein-ss-pred

Usage

GOR

Pretrained model (model.npz) is available.
Prediction from the GOR method can be obtained using:

python gor-predict.py [-h] (--pssm PSSM | --fasta FASTA) filename_model

For example:

python src/gor-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.npz

SVM

Pretrained model (model.sav.tar.gz) is available.
Before it can be used it must be extracted in the following way:

tar -xzvf models/model.sav.tar.gz -C models/

Prediction from the SVM-based method can be obtained using:

python svm-predict.py [-h] (--pssm PSSM | --fasta FASTA) [--probs] filename_model

For example:

python src/svm-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.sav

Training

GOR

GOR model can be trained using:

python gor-train.py [-h] [--filename_model FILENAME_MODEL]
                    [--window_size WINDOW_SIZE]
                    filename_id_list dir_pssm dir_dssp

For example:

python src/gor-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model

SVM

SVM model can be trained using:

python svm-train.py [-h] [--filename_model FILENAME_MODEL]
                    [--window_size WINDOW_SIZE]
                    filename_id_list dir_pssm dir_dssp

For example:

python src/svm-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model

Performance

GOR SVM
CV Blind test CV Blind test
SEN_H 0.86±0.01 0.83 0.80±0.01 0.72
SEN_E 0.62±0.01 0.60 0.58±0.01 0.62
SEN_C 0.42±0.01 0.42 0.82±0.00 0.85
PPV_H 0.58±0.01 0.60 0.82±0.01 0.85
PPV_E 0.54±0.01 0.58 0.75±0.01 0.80
PPV_C 0.80±0.01 0.73 0.72±0.00 0.65
MCC_H 0.50±0.01 0.46 0.71±0.01 0.67
MCC_E 0.45±0.01 0.46 0.58±0.01 0.63
MCC_C 0.40±0.01 0.39 0.58±0.01 0.56
SOV_H 65.48±0.99 62.70 76.39±0.97 68.64
SOV_E 58.64±1.35 63.18 59.20±2.18 67.19
SOV_C 43.09±0.69 45.57 70.18±0.93 70.93
ACC 0.62±0.00 0.62 0.76±0.00 0.75

License

MIT @ Katarina Elez