Each Jupyter notebook within this repository attempts to explore a specific machine-learning algorithm and some useful tools required for data scientists. I try to balance practical implementations with the underlying mathematical foundations of these models on my notes based on the homeworks proposed by the course Machine Learning Zoomcamp. I alo use as a guide the book Machine Learning Bookcamp, while the mathematical insights are drawn from the book Data Mining and Machine Learning
While this repository reflects my personal journey of learning and understanding machine learning, I hope it can also be a valuable resource for others on a similar path.
How to create a car-price prediction project with a linear regression model.
Predicting customers who will churn with logistic regression for classification.
Study of confusion matrix, deriving metrics from confusion matrices, like precision and recall, and Using ROC and AUC metrics to further understand the performance of a binary classification. We also focus on Tuning the parameters for the model and how to use cross-validation to verify the model's behavior.
Introduction on how to save models with Pickle and Flask, how to manage dependencies with Pipenv, and how to use Docker.
Project for predicting the risk of default with tree-based models (Decision tree, Random Forest, and XGBoost).
This project uses the Diabetes Health Indicators dataset, available on kaggle. The project seeks answer which risk factors most strongly predict diabetes and how select a subset of the risk factors to accurately predict whether an individual has diabetes.
Study about the fundamentals and applications of Convolutional Neural Networks (CNNs), a class of deep neural networks, primarily used to analyze images.
Deploying deep learning models in a serverless environment, showing the benefits of scalability, cost-efficiency, and ease of deployment.
Study about Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications.
In this project, was build a machine learning model to predict the trip duration of taxi rides in New York City. The dataset is from Kaggle. The original dataset have approximately 1.4 million entries for the training set and 630k for the test set, although only the training set is used in this project.