Machine Learning Recipes 😇 Datasets used here are available at Kaggle
Hey there,
This repo contains various machine learning scripts which I've used on datasets which can be found on Kaggle. I've used many algorithms. More details are given below
-
Iris: This is the classic dataset which almost every beginner in ML knows about. Here, the aim is to classify the flower species. Two scripts are added using K-Nearest-Neighbors and Naive Bayes. Required Data to classify is SepalLength in cm, SepalWidth in cm, PetalLength in cm, PetalWidth in cm.
-
Pokemon: This dataset contains details of many Pokemons. Here, the aim is to classify a Pokemon as legendary or not-legendary. Two scripts are added using K-Nearest-Neighbors and Support Vector Machines. Required Data to classify is Hit Points, Attack Points, Defence Points, Special Attack Points, Special Defence Points, Speed Points.
-
Mushrooms: This dataset contains details of many Mushrooms. Here, the aim to classify a Mushroom as poisonous or edible. Two scripts are added using K-Nearest-Neighbors and Naive Bayes. Details about required data is mentioned in the program itself.
-
Restaurant Reviews: This dataset contains many reviews about a restaurant. Here, the aim is to classify a restaurant review as good or bad. Two scripts are added using Naive Bayes and Random Forest. It uses nltk to clean the text and CountVectorizer to create bag of words model. Required data to classify is just the review.
-
Churn Model: This dataset contains details about bank employees. Here, the aim is predict whether a customer will leave bank's service or not. It uses artificial neural network to do the job. Also, keras is used here. Details about required data is mentioned in the program itself. Update: I've added a new script which uses XGBoost to do the same thing. Also, I've used K-Fold Cross Validation to evaluate the model and to find out its accuracy.
-
Red Wine Quality: This dataset contains details about red wines. Here, the aim is to predict the quality of a wine and to find out the columns (parameters) which are important to judge a red wine. I've used Mulitple Linear Regression to predict and Backward Elimination to build the optimal model. Details about required data is mentioned in the folder itself. Update: I've used Principal Component Analysis to view the varience in various columns.
-
Cereals: This dataset contains details about cereals. Here, the aim is to predict the rating of the cereal. I've used Mulitple Linear Regression to predict.
-
Google Stock Price: This dataset contains details about Google stock prices. Here, the aim is to predict Google stock prices using the test set. I've used Recurrent Neural Network (RNN) to do the job. Keras library is used here for training the RNN. More info is given inside the 'GoogleStockPrice' folder.
-
SQL Scavenger Hunt: I participated in the SQL Scavenger Hunt from Kaggle. This folder is not related to Machine Learning but deals with data using SQL and python.
email me at pranavj1001@gmail.com
MIT License