Dear John,
I uploaded 12 files for this project. The files "0309homo70_with_labels.txt" and "0309homology70.fasta" are the raw datasets I used for my project, and they are generated by doing homology reduction with the cutoff 0.7.
The files "Accurancy_with_cross_validation_on_raw_database" "Accurancy_with_cross_validation_with_evolutionary_information", "random_forest", "decision_tree" are the python scripts respectively for svm on the raw dataset, svm on the pssm files generated by psi-blast, random forest on the pssm files generated by psi-blast, and decision tree on the pssm files generated by psi-blast. After running either of these four files, the user will get the accuracy got out of a 5-fold cross validation.
The compressed folders "Model_1_svm.tar.gz", "Model_2_randomforest.tar.gz" and "Model_3_randomforest-INPUT-flies.tar.gz" are model scripts and the resources needed when running them. The instructions are inside each of these folders.
The folder "pssm_files.tar.gz" contains all the pssm files I got. The files "Model_3state3line_stride_Xueqing_Wang(random_forest).py" and "Model_3state3line_stride_Xueqing_Wang(svm).py" are the python scripts of my predictor. These three files are included in the three model files and thus can be overlooked.
Thank you for your attention!
Xueqing