This project generates random protein sequences, clusters them using K-Means, and visualises the clusters using PCA.
- Random Sequence Generation: Generate random protein sequences with customisable length.
- K-Means Clustering: Cluster sequences based on their ecoded sequence.
- PCA Visualization: Visualize clusters in 2D space using Principal Component Analysis.
sequence_generation.py: Contains functions to generate random protein sequences.
clustering.py: Encodes sequences and performs clustering using K-Means.
visualisation.py: Visualizes the clustered sequences using PCA.
main.py: Orchestrates the sequence generation, clustering, and visualisation.
- Clone the repository:
git clone https://github.com/Pretzelx/Protein-Sequence-Analysis.git
- Navigate to the project directory: cd protein-sequence-analysis
- Run the code using the following command: python main.py