Protein Feature Extraction for Machine Learning

Python code to extract features from Protein sequences for Machine Learning/Deep Learning

Protein feature extraction is carried out using Biopython package

Format:

Features (27 features):

AA-count (20x features)
aromaticity (1x)
secondary_structure_fraction (3x)
isoelectric_point (1x)
molecular_weight (1x)
instability_index (1x)

Packages required (other than built-in) for the execution of code... -Pandas -pickle -Biopython -subprocess

Top N features for identifying Insuliin protein sequence

Format:

Installation

For windows Windows users have to specify the path to fasta files and output folder in linux style of referencing directory using / slash rather than \ eg C:/folder_name/file_name.fasta This issue will be fixed in future updates

pip install discere

For linux

pip3 install discere

Usage

  import discere.discere as di
  
  di.extract_feature('./Documents/positive_training.fasta', 
                     './Documents/negative_training.fasta', 
                     './Documents')

di.extract_feature(input_file1, input_file2, output_directory)

output

Outputs are stored in user_specified_path/output in .txt, .arff and .csv formats

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
discere		discere
images		images
statistics		statistics
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Feature Extraction for Machine Learning

Features (27 features):

Top N features for identifying Insuliin protein sequence

Installation

Usage

output

About

Releases 3

Packages

Languages

License

jithin8mathew/Protein-feature-extraction

Folders and files

Latest commit

History

Repository files navigation

Protein Feature Extraction for Machine Learning

Features (27 features):

Top N features for identifying Insuliin protein sequence

Installation

Usage

output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages