Forecasting Demand for Ridshare Companies

-- Project Status: [Completed]

Project Objective

This project aims to help rideshare providers forecast demand, aiming to increase ridership and reliability for customers. The data was retrieved from the Chicago Data Portal

Methods

Querying Data:
- Pulling data from Socrata Open Data Portal
Cleaning Data:
- grouping by timeframes, computing null values
Time Series Modeling:
- gathering trends of data to forecast
Plotting:
- visualizing data and forecast results
Multi-Core Processing:
- utilizing all cores to improve run-time of calculations

Including Packages

requests - query SODA database
pandas - loading and pre-processing data
matplotlib, seaborn - plotting results
statsmodels, pmdarima - time series modeling
kepler.gl - geospatial visualizations

Project Description

This project is broken down into four jupyter notebooks:

query_clean:

To start, using the SOCRATA platform, I grouped the data by ridership on a hourly basis, as the original file by ride is too large to hold on a local machine. I gathered information about my dataset, including days and hours of peak ridership, community areas with the highest ridership, and weather on a daily basis. These numbers improve understanding of the dataset, and help decision making for further modeling.
city_modeling:

This consists of running models on a 'city-wide' basis, to determine base scores and best model for the data. Tests include SARIMA, auto-SARIMA, and Holt-Winters' Exponential Smoothing. After optimizing parameters, I chose to continue with Holt-Winters' because of the superior speed, accuracy, and minimal computation needed.
neighborhood_models:

To develop accurate location informed decisions, I separated the data into individual community areas, and develop separate models, as demand differentiates for each. Modeling for each neighborhood is computationaly expensive, so incorporating multi-core processing brought modelin time from 1 hour to ~10 minutes.
kepler.gl

The ultimate output of this project would be to visualize when and where demand was happening. Kepler as a tool was able to combine geospatial boundaries for community areas along with the time-series of ridership to accurately represent demand - viewable on this Live Dashboard

Contact

Feel free to contact me if any questions, or comments on the project - [hello@dancorley.com]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
01 - query_clean.ipynb		01 - query_clean.ipynb
02 - city_modeling.ipynb		02 - city_modeling.ipynb
03 - neighborhood_models.ipynb		03 - neighborhood_models.ipynb
04 - kepler_gl.ipynb		04 - kepler_gl.ipynb
README.md		README.md
holt_modeling.py		holt_modeling.py
soda_query.py		soda_query.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forecasting Demand for Ridshare Companies

-- Project Status: [Completed]

Project Objective

Methods

Including Packages

Project Description

query_clean:

city_modeling:

neighborhood_models:

kepler.gl

Contact

About

Releases

Packages

Languages

DanCorley/uber_demand

Folders and files

Latest commit

History

Repository files navigation

Forecasting Demand for Ridshare Companies

-- Project Status: [Completed]

Project Objective

Methods

Including Packages

Project Description

query_clean:

city_modeling:

neighborhood_models:

kepler.gl

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages