Riboswitch

In a last decade, RNA sequencing technology and computational methodology have generated huge impetus to riboswitch research.

One of the main challenges raised during classification of riboswitch was imbalanced data.

Previous published classifers all base on untreated imbalanced data, which leads to ignore minority group and emphasize on majority class, consequential return a skewed performance.

This repository includes parts of Machine learning model selection and Performance evaluation (Sensitivity, Specificity and Accuracy, F-score).

Workflow

Tutorials

model _selection.ipynb

Read in cleaning riboswitch-kmers matrix csv file as following format:

class kmer1 kmer2 ... kmer N

Family name 1 k-mer counting

Family name 2 ...

... ...

Family name M k-mer counting
Generate fixed training set and test set and preserve them in home direction
10 Fold CV applied in six algorithms to get relative best parameters. The script will preserve all best models, both balanced models and imbalanced models in Model folder.

classification report.ipynb

load models generated by model_selection.ipynb
load training set and testing set
generate classification report in automaticly created folder classification report

figures.ipynd

load models generated by model_selection.ipynb
generate confusion matrix and other figures. All preserved in Figures folder.

The following three ipynb files have other uses and not necessary to the workfolw:

ribo_colormap_produce_kmerfamily.ipynb

ribo_colormap_input_txt.ipynb

feature_selection

Dependencies

Python

seaborn==0.9.0

pandas==0.24.2

PDPbox==0.2.0

shap==0.29.1

numpy==1.16.2

imbalanced_learn==0.4.3

matplotlib==3.0.3

ipython==7.5.0

imblearn==0.0

scikit_learn==0.21.2

Method of installing above packages:

change directory to the project's home directory which exists the file "requirements.txt"
entering

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Classification_report		Classification_report
Confusion_matrix		Confusion_matrix
Figures		Figures
README.assets		README.assets
.gitignore		.gitignore
156_new_selected_156.csv		156_new_selected_156.csv
README.md		README.md
classification_report.ipynb		classification_report.ipynb
figures.ipynb		figures.ipynb
heat_norm_156_modi.csv		heat_norm_156_modi.csv
kmer-family matrix（source data for Figure 2）.ipynb		kmer-family matrix（source data for Figure 2）.ipynb
kmer_family.csv		kmer_family.csv
model_selection.ipynb		model_selection.ipynb
ribo_colormap_input_txt.ipynb		ribo_colormap_input_txt.ipynb
sub_feature_selection.ipynb		sub_feature_selection.ipynb
x_test.csv		x_test.csv
x_train.csv		x_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Riboswitch

Workflow

Tutorials

model _selection.ipynb

classification report.ipynb

figures.ipynd

The following three ipynb files have other uses and not necessary to the workfolw:

Dependencies

Python

About

Releases

Packages

Languages

class	kmer1	kmer2	...	kmer N
Family name 1	k-mer counting
Family name 2		...
...			...
Family name M				k-mer counting

solshiferaw/Riboswitch

Folders and files

Latest commit

History

Repository files navigation

Riboswitch

Workflow

Tutorials

model _selection.ipynb

classification report.ipynb

figures.ipynd

The following three ipynb files have other uses and not necessary to the workfolw:

Dependencies

Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages