Skip to content

Latest commit

 

History

History
76 lines (64 loc) · 4.17 KB

README.md

File metadata and controls

76 lines (64 loc) · 4.17 KB

Data Science and Machine Learning Projects

This repository contains a collection of data science and machine learning projects using Python, R, and various tools and libraries. The projects demonstrate skills in supervised learning, time series analysis, statistical modeling, and custom function development. Additionally, it includes coursework projects from Columbia University and Cornell University's Machine Learning program. Below is an overview of the key files, folders, and their purposes.

Repository Contents

Coursework Folders

  • Columbia-Machine-Learning-Course
    Contains coursework projects from Columbia University’s Data Science graduate engineering program. My work showcases the heavy detailed mathematical computations that go behind most machine learning models as well as how to implement them from scratch.
  • Cornell-Machine-Learning
    Includes coursework from my certificate in Machine Learning from Cornell University. Projects cover topics such as supervised and unsupervised learning, optimization, and neural network design.

Python Notebooks

  • Automatic Data Preparation.ipynb
    Automates preprocessing workflows for machine learning pipelines.
  • Data Exploration – Heart Disease Prediction.ipynb
    Initial exploratory data analysis on the heart disease dataset.
  • Email Spam Classifier.ipynb
    Implements a machine learning model to classify spam emails.
  • GLM.ipynb
    Explores generalized linear models for predictive analytics.
  • Heart Disease Prediction with Sklearn.ipynb
    Builds and evaluates models for predicting heart disease using Scikit-learn.
  • MNIST 97% Accuracy.ipynb
    Achieves high accuracy on the MNIST dataset with simple KNN.
  • Model Precision in Scikit-learn.ipynb
    Investigates precision and evaluation metrics in supervised learning.
  • Neural Networks in TensorFlow.ipynb
    Develops neural networks using TensorFlow.
  • Preprocessing and Pipelines in Sklearn.ipynb
    Focuses on preprocessing and pipeline creation in Scikit-learn.
  • Regression with Scikit-learn.ipynb
    Implements regression models with Scikit-learn.
  • Time Series Models in Scikit-learn.ipynb
    Demonstrates time series modeling in Scikit-learn.
  • Working with Time Series as Inputs to a Model.ipynb
    Converts time series data for use in predictive modeling.

R Scripts

  • Holt_Winters_Method.R
    Applies the Holt-Winters method to forecast government spending.
  • arima_model_gov_expenditures.Rmd
    Uses ARIMA models for government expenditure forecasting.
  • parralel.rmd
    Explores parallel programming techniques in R.
  • hyp_test.Rmd
    Performs hypothesis testing in R.

Outputs

  • arima_model_gov_expenditures.pdf
    PDF output of ARIMA modeling analysis.

Other Files

  • ML Custom Functions.ipynb
    Contains reusable custom functions for machine learning pipelines.
  • backup
    Backup of important data and models.

Key Features

  • Predictive modeling for use cases such as heart disease prediction, spam detection, and MNIST classification.
  • Time series analysis using ARIMA and Holt-Winters methods.
  • Preprocessing, pipelines, and precision evaluation in Scikit-learn.
  • Neural network implementation using TensorFlow.
  • Advanced techniques like generalized linear models and parallel programming.
  • Academic projects from Columbia University and Cornell University, demonstrating mastery of foundational and advanced machine learning topics and mathematics.

How to Use

  1. Clone the repository:
    git clone https://github.com/your-repo-name.git
    

Other Projects

I developed a user-friendly Python application that leverages OLS regression, advanced functional modeling, and time series forecasting (Holt-Winters, VAR) to help small businesses optimize production, pricing, advertising, and financial decisions with no technical expertise required.

Undergraduate Computer Science Independent Study