This project aims to predict the onset of diabetes based on certain diagnostic measures included in the dataset. We have implemented several machine learning algorithms for this classification task, including logistic regression, k-nearest neighbors classifier, support vector classifier (SVC), Gaussian Naive Bayes, decision tree, and random forest.
The dataset used in this project is the Pima Indians Diabetes Database, which contains various health-related variables for Pima Indian women. The dataset can be found "https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database"
- Logistic Regression
- K-Nearest Neighbors Classifier
- Support Vector Classifier (SVC)
- Gaussian Naive Bayes
- Decision Tree
- Random Forest
- Python 3
- Jupyter Notebook
- Libraries: pandas, numpy, scikit-learn
This project is aimed at detecting fraudulent transactions in credit card data using a random forest classifier. It utilizes machine learning techniques to identify patterns and anomalies in credit card transactions that may indicate fraudulent activity.
The dataset used in this project contains credit card transactions made by European cardholders. It consists of a highly imbalanced dataset with a small number of positive (fraudulent) cases compared to the negative (non-fraudulent) cases. Due to privacy concerns, the original features have been anonymized using Principal Component Analysis (PCA).
The dataset can be found "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud"
Random forest is an ensemble learning method that operates by constructing a multitude of decision trees during training and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
- Python 3
- Jupyter Notebook
- Libraries: pandas, numpy, scikit-learn
This project aims to classify movie reviews as positive or negative using machine learning algorithms: XGBoost, logistic regression, and random forest classifier. The system analyzes textual data from movie reviews to determine their sentiment polarity.
The dataset used in this project consists of movie reviews labeled as positive or negative sentiment. Due to licensing restrictions, the dataset cannot be provided here. However, similar datasets are available from various sources, such as IMDb movie reviews dataset or sentiment analysis datasets on Kaggle. The dataset can be found "https://www.kaggle.com/c/word2vec-nlp-tutorial/data"
- XGBoost Classifier
- Logistic Regression
- Random Forest Classifier
- Python 3
- Jupyter Notebook
- Libraries: pandas, numpy, scikit-learn, xgboost