Skip to content

tiashamaitra/technohacks-project

Repository files navigation

Diabetes Prediction Project

This project aims to predict the onset of diabetes based on certain diagnostic measures included in the dataset. We have implemented several machine learning algorithms for this classification task, including logistic regression, k-nearest neighbors classifier, support vector classifier (SVC), Gaussian Naive Bayes, decision tree, and random forest.

Dataset

The dataset used in this project is the Pima Indians Diabetes Database, which contains various health-related variables for Pima Indian women. The dataset can be found "https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database"

Algorithms Implemented

  1. Logistic Regression
  2. K-Nearest Neighbors Classifier
  3. Support Vector Classifier (SVC)
  4. Gaussian Naive Bayes
  5. Decision Tree
  6. Random Forest

Usage

Prerequisites

  • Python 3
  • Jupyter Notebook
  • Libraries: pandas, numpy, scikit-learn

Credit Card Fraud Detection System

This project is aimed at detecting fraudulent transactions in credit card data using a random forest classifier. It utilizes machine learning techniques to identify patterns and anomalies in credit card transactions that may indicate fraudulent activity.

Dataset

The dataset used in this project contains credit card transactions made by European cardholders. It consists of a highly imbalanced dataset with a small number of positive (fraudulent) cases compared to the negative (non-fraudulent) cases. Due to privacy concerns, the original features have been anonymized using Principal Component Analysis (PCA).

The dataset can be found "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud"

Random Forest Classifier

Random forest is an ensemble learning method that operates by constructing a multitude of decision trees during training and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Usage

Prerequisites

  • Python 3
  • Jupyter Notebook
  • Libraries: pandas, numpy, scikit-learn

Movie Review Classifier

This project aims to classify movie reviews as positive or negative using machine learning algorithms: XGBoost, logistic regression, and random forest classifier. The system analyzes textual data from movie reviews to determine their sentiment polarity.

Dataset

The dataset used in this project consists of movie reviews labeled as positive or negative sentiment. Due to licensing restrictions, the dataset cannot be provided here. However, similar datasets are available from various sources, such as IMDb movie reviews dataset or sentiment analysis datasets on Kaggle. The dataset can be found "https://www.kaggle.com/c/word2vec-nlp-tutorial/data"

Algorithms Implemented

  1. XGBoost Classifier
  2. Logistic Regression
  3. Random Forest Classifier

Usage

Prerequisites

  • Python 3
  • Jupyter Notebook
  • Libraries: pandas, numpy, scikit-learn, xgboost

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published