In this notebook we take a dataset of breast cancer biopsies and apply six machine learning models to create predictions for whether the patient has a malignant or benign tumor. The best performing model was able to achieve a F1 of .977. This notebook is hosted on Kaggle and can be found here: https://www.kaggle.com/code/jarredpriester/machine-learning-ensemble-breast-cancer-prediction
First, I was wanting to practice working with machine learning models. Second, I am curious to see how data science can be used in healthcare.
I learned that machine learning can be very effective in the healthcare industry. I gained more experience in the caret library, especially with fine tuning the random forest and the K nearest neighbor models.
The Breast Cancer Wisconsin (Diagnostic) Dataset is a popular dataset from the University of California Irvine Machine Learning Repository. The dataset consists of 529 rows and 32 columns. Each row represents a tumor sample and each column represents a feature.
Breast_Cancer_Kaggle.R - R script
Breast_Cancer_Kaggle.Rmd - R Markdown
Breast_Cancer_Kaggle.pdf - PDF