Skip to content

Bike Sharing Demand Prediction using Multiple Linear Regression

Notifications You must be signed in to change notification settings

HrithikRai/Bike-Sharing-Demand-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Bike-Sharing-Demand-Prediction

Regression Analysis - Bike Sharing Demand Estimation

Kaggle competition - Bike Sharing Demand

  • This Bike sharing system will function as a sensor network, which can be used for studying mobility in a city. Here we combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.
  • Used a multilinear regression model. You can find the project here
  • Bike sharing demand prediction using hourly dataset (17379 rows, 17 features)
  • Took care of Multiple Linear Regression Assumptions :
    1. Autocorrelation
    2. Multicollinearity
    3. Endogeneity
    4. Residual Normality
  • Libraries used - Pandas,Numpy,Matplotlib,Math,sklearn
  • RMSLE score of the predictor = 0.3560. This falls under the top 1 percentile(<0.367) of the Kaggle Bike Prediction Comp.

Visualizations and Inferences:

Relation between Continuous variables and Demand.

  • Higher the windspeed, lower the demand
  • Temperature and Demand seems to be directly correlated
  • Plots of temp and atemp are almost identical pointing out to some correlation, therefore a multicollinearity check is reqd.
  • Humidity and Windspeed affects demand but need more statistical analysis like correlation coefficient check

Relation between Categorical variables and Demand.

  • Weekday doesn't affect the demand therefore can be dropped
  • Year doesnt affect since only 2 years given
  • Hourly Data : Park bikes near public transport in morning and office premises in the afternoon

Correlation Matrix

  • Drop a temp since showing high multicollinearity with temp
  • Also humidity has a high correlation with wind speed. And windspeed has low Correlation with demand.
  • Therefore windspeed could be dropped.

Autocorrelation

  • High auto-correlation upto 5 previous values(Top 3 > 0.8).
  • Since autocorrelation is the dependent variable, we can't get rid of it.

Log Normalizing the Dependent Variable - Demand

Therefore after fitting the processed data into our regressor:

  • RMSE Score - 0.380
  • RMSLE Score - 0.356

About

Bike Sharing Demand Prediction using Multiple Linear Regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages