This repository contains the code and materials for a research project focused on predicting the toxicity of molecules influencing the function of the CRY1 protein.
In this project, we employed various machine learning techniques to develop predictive models for assessing the toxicity of molecules. The primary techniques and methodologies used include:
-
Feature Selection: We applied feature selection methods to identify the most relevant molecular features for toxicity prediction, enhancing model performance and interpretability. Methods that we applied are Variance Threshold, Select K Best and RFE.
-
SMOTE (Synthetic Minority Over-sampling Technique): To handle class imbalance in the dataset, we utilized SMOTE to generate synthetic samples of the minority class, improving model training.
-
Decision Trees: Decision tree models were employed to capture non-linear relationships within the data and provide valuable insights into the toxicity prediction process.
-
Support Vector Machines (SVM): SVMs were used for their ability to handle complex classification tasks and maximize predictive accuracy.
-
Ensemble Methods: Ensemble techniques, such as Bagging, AdaBoost and Random Forest, were implemented to combine the strengths of multiple models and enhance overall predictive performance.