Skip to content

Pretzelx/Protein-Sequence-Analysis

Repository files navigation

Protein Sequence Clustering with PCA

This project generates random protein sequences, clusters them using K-Means, and visualises the clusters using PCA.

Features

  • Random Sequence Generation: Generate random protein sequences with customisable length.
  • K-Means Clustering: Cluster sequences based on their ecoded sequence.
  • PCA Visualization: Visualize clusters in 2D space using Principal Component Analysis.

Code Structure

sequence_generation.py: Contains functions to generate random protein sequences.

clustering.py: Encodes sequences and performs clustering using K-Means.

visualisation.py: Visualizes the clustered sequences using PCA.

main.py: Orchestrates the sequence generation, clustering, and visualisation.

Example Output

image PCA fig 1

Installation

  1. Clone the repository:
    git clone https://github.com/Pretzelx/Protein-Sequence-Analysis.git
  2. Navigate to the project directory: cd protein-sequence-analysis
  3. Run the code using the following command: python main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages