Diabetes Prediction Project

This project aims to predict the onset of diabetes based on certain diagnostic measures included in the dataset. We have implemented several machine learning algorithms for this classification task, including logistic regression, k-nearest neighbors classifier, support vector classifier (SVC), Gaussian Naive Bayes, decision tree, and random forest.

Dataset

The dataset used in this project is the Pima Indians Diabetes Database, which contains various health-related variables for Pima Indian women. The dataset can be found "https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database"

Algorithms Implemented

Logistic Regression
K-Nearest Neighbors Classifier
Support Vector Classifier (SVC)
Gaussian Naive Bayes
Decision Tree
Random Forest

Usage

Prerequisites

Python 3
Jupyter Notebook
Libraries: pandas, numpy, scikit-learn

Credit Card Fraud Detection System

This project is aimed at detecting fraudulent transactions in credit card data using a random forest classifier. It utilizes machine learning techniques to identify patterns and anomalies in credit card transactions that may indicate fraudulent activity.

Dataset

The dataset used in this project contains credit card transactions made by European cardholders. It consists of a highly imbalanced dataset with a small number of positive (fraudulent) cases compared to the negative (non-fraudulent) cases. Due to privacy concerns, the original features have been anonymized using Principal Component Analysis (PCA).

The dataset can be found "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud"

Random Forest Classifier

Random forest is an ensemble learning method that operates by constructing a multitude of decision trees during training and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Usage

Prerequisites

Python 3
Jupyter Notebook
Libraries: pandas, numpy, scikit-learn

Movie Review Classifier

This project aims to classify movie reviews as positive or negative using machine learning algorithms: XGBoost, logistic regression, and random forest classifier. The system analyzes textual data from movie reviews to determine their sentiment polarity.

Dataset

The dataset used in this project consists of movie reviews labeled as positive or negative sentiment. Due to licensing restrictions, the dataset cannot be provided here. However, similar datasets are available from various sources, such as IMDb movie reviews dataset or sentiment analysis datasets on Kaggle. The dataset can be found "https://www.kaggle.com/c/word2vec-nlp-tutorial/data"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
credit card fraud.ipynb		credit card fraud.ipynb
diabetes prediction.ipynb		diabetes prediction.ipynb
movie reviews classification.ipynb		movie reviews classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Prediction Project

Dataset

Algorithms Implemented

Usage

Prerequisites

Credit Card Fraud Detection System

Dataset

Random Forest Classifier

Usage

Prerequisites

Movie Review Classifier

Dataset

Algorithms Implemented

Usage

Prerequisites

About

Releases

Packages

Languages

tiashamaitra/technohacks-project

Folders and files

Latest commit

History

Repository files navigation

Diabetes Prediction Project

Dataset

Algorithms Implemented

Usage

Prerequisites

Credit Card Fraud Detection System

Dataset

Random Forest Classifier

Usage

Prerequisites

Movie Review Classifier

Dataset

Algorithms Implemented

Usage

Prerequisites

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages