This repository contains a collection of data science and machine learning projects using Python, R, and various tools and libraries. The projects demonstrate skills in supervised learning, time series analysis, statistical modeling, and custom function development. Additionally, it includes coursework projects from Columbia University and Cornell University's Machine Learning program. Below is an overview of the key files, folders, and their purposes.
- Columbia-Machine-Learning-Course
Contains coursework projects from Columbia University’s Data Science graduate engineering program. My work showcases the heavy detailed mathematical computations that go behind most machine learning models as well as how to implement them from scratch. - Cornell-Machine-Learning
Includes coursework from my certificate in Machine Learning from Cornell University. Projects cover topics such as supervised and unsupervised learning, optimization, and neural network design.
Automatic Data Preparation.ipynb
Automates preprocessing workflows for machine learning pipelines.Data Exploration – Heart Disease Prediction.ipynb
Initial exploratory data analysis on the heart disease dataset.Email Spam Classifier.ipynb
Implements a machine learning model to classify spam emails.GLM.ipynb
Explores generalized linear models for predictive analytics.Heart Disease Prediction with Sklearn.ipynb
Builds and evaluates models for predicting heart disease using Scikit-learn.MNIST 97% Accuracy.ipynb
Achieves high accuracy on the MNIST dataset with simple KNN.Model Precision in Scikit-learn.ipynb
Investigates precision and evaluation metrics in supervised learning.Neural Networks in TensorFlow.ipynb
Develops neural networks using TensorFlow.Preprocessing and Pipelines in Sklearn.ipynb
Focuses on preprocessing and pipeline creation in Scikit-learn.Regression with Scikit-learn.ipynb
Implements regression models with Scikit-learn.Time Series Models in Scikit-learn.ipynb
Demonstrates time series modeling in Scikit-learn.Working with Time Series as Inputs to a Model.ipynb
Converts time series data for use in predictive modeling.
Holt_Winters_Method.R
Applies the Holt-Winters method to forecast government spending.arima_model_gov_expenditures.Rmd
Uses ARIMA models for government expenditure forecasting.parralel.rmd
Explores parallel programming techniques in R.hyp_test.Rmd
Performs hypothesis testing in R.
arima_model_gov_expenditures.pdf
PDF output of ARIMA modeling analysis.
ML Custom Functions.ipynb
Contains reusable custom functions for machine learning pipelines.backup
Backup of important data and models.
- Predictive modeling for use cases such as heart disease prediction, spam detection, and MNIST classification.
- Time series analysis using ARIMA and Holt-Winters methods.
- Preprocessing, pipelines, and precision evaluation in Scikit-learn.
- Neural network implementation using TensorFlow.
- Advanced techniques like generalized linear models and parallel programming.
- Academic projects from Columbia University and Cornell University, demonstrating mastery of foundational and advanced machine learning topics and mathematics.
- Clone the repository:
git clone https://github.com/your-repo-name.git
I developed a user-friendly Python application that leverages OLS regression, advanced functional modeling, and time series forecasting (Holt-Winters, VAR) to help small businesses optimize production, pricing, advertising, and financial decisions with no technical expertise required.