This repositary contains my Major Project for the course Pattern Recognition and Machine Learning.
According to the statistics mentioned by the World Health Organization (WHO), stroke is the 2nd largest cause of death contributing to 11% of the death rate. And it is a classification problem which makes the research more interesting because there are many algorithms for classification problems and even the prediction rate is more accurate for classification problems. That's why we are going to use machine learning to solve this problem.It will somehow help in decreasing the death rate due to stroke.
We have basically used 7 classification models along with other essential Machine Learning concepts. The ultimate goal of taking up the project is to suggest the best model which predicts stroke given various inputs about the health condition.
GAN : The dataset is taken from Stroke_prediction_dataset.
The train dataset contains 5110 rows and 12 columns, here is a bit description of our dataset:
- Features in our dataset: id, gender, age, hypertension, heart disease, ever married, work type, residence type, BMI, average glucose level and smoking status.
- 1 id column
- 1 stroke column
- 10 latent vector column
- The dataset has been split into train and test with test size of 0.3.
- Logistic regression
- Random Forest Classifier
- Decision Tree Classifier
- Support vector machine (SVM)
- K-nearest neighbor (KNN)
- Naive bayes
- XGB
- Data Preprocessing
- Feature Selection
- EDA
- Model Training
- Hyperparameter tuning
- Performance measure
- SFFS
- Aasrish Vinay Perumalla
- Vedasamhitha Challapalli
- R.Amshu Naik