Bicycle-sharing system, bike share program,[1] public bicycle scheme,[2] or public bike share (PBS) scheme,[3] is a shared transport service in which bicycles are made available for shared use to individuals on a short-term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" and return it at another dock belonging to the same system. Docks are special bike racks that lock the bike, and only release it by computer control. The user enters payment information, and the computer unlocks a bike. The user returns the bike by placing it in the dock, which locks it in place. Other systems are dockless.
A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system.
A US bike-sharing provider BoomBikes has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. The company is finding it very difficult to sustain in the current market scenario. So, it has decided to come up with a mindful business plan to be able to accelerate its revenue as soon as the ongoing lockdown comes to an end, and the economy restores to a healthy state.
In such an attempt, BoomBikes aspires to understand the demand for shared bikes among the people after this ongoing quarantine situation ends across the nation due to Covid-19. They have planned this to prepare themselves to cater to the people's needs once the situation gets better all around and stand out from other service providers and make huge profits.
You are required to model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations. Further, the model will be a good way for management to understand the demand dynamics of a new market.
- No or less demand and high supply
- High demand and no or less supply
- Which variable are significant in predicting the demand for shared bikes
- How well those variabe describe the bike demands
- Steps for crreating a linear regression model :
- Data Visualization
- Perform EDA to understand various variables.
- Check the correlation between the variables.
- Data Preparation
- Create dummy variables for all the categorical features.
- Divide the data to train & Test.
- Perform Scaling.
- Divide data into dependent & Independent variables.
- Data Modelling & Evaluation
- Create Linear Regression model using mixed approach (RFE & VIF/p-value).
- Check the various assumptions.
- Check the Adjusted R-Square for both train & Test data.
- Report the final model.
- CNT : Average demand for the bike is ~4508 per day
- DTEDAY : Does not have much affect on the demand
- TEMP : 20-40degress temperature have a higher average rental bike per day than overall average rental
20-40 have a higher average rental bike per day than 0-20 - ATEMP : 25-40degress temperature have a higher average rental bike per day than overall average rental
25-40 degress temperature have a higher average rental bike per day than 0-25 - HUM : 20-30 humidity levels have a higher average rental bike per day than overall average rental
20-30 humidity levels have a higher average rental bike per day than other levels - WINDSPEED : 0-10 windspeed levels have a higher average rental bike per day than overall average rental
0-10 windspeed levels have a higher average rental bike per day than other levels
- SEASON : Fall, Summer and Winter have a higher average rental bike per day than overall average rental
Fall and Summer have a higher average rental bike per day than Winter and Spring - YR : 2019 year have a higher average rental bike per day than overall average rental
2019 year have a higher average rental bike per day than 2018 year - MNTH : 6,9,8,7,5 and 10 months have a higher average rental bike per day than overall average rental
6,9,8,7,5 and 10 have a higher average rental bike per day than 4,11,3,12,2,1 - HOLIDAY : Non Holidays have a higher average rental bike per day than overall average rental
Non Holidays have a higher average rental bike per day than Holidays - WEEKDAY : Weekday 2,3,4,5,6 have a higher average rental bike per day than overall average rental
Weekday 2,3,4,5,6 have a higher average rental bike per day than Weekday 1,2 - WORKINGDAY : Working day have a higher average rental bike per day than overall average rental
Working day have a higher average rental bike per day than Non Working Day - WEATHERSIT : Clear, Few clouds, Partly cloudy, Partly cloudy Weather have a higher average rental bike per day than overall average rental
Clear, Few clouds, Partly cloudy, Partly cloudy Weather have a higher average rental bike per day than Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist, Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- Model Created All Variables
R2 = 0.853
Adj R2 = 0.845 - Model Created with top 18 variables using RFE
R2 = 0.851
Adj R2 = 0.846 - Model Created droping holiday due to p value > 0.05
R2 = 0.850
Adj R2 = 0.845 - Model Created droping humidity due to VIF value > 5
R2 = 0.845
Adj R2 = 0.840 - Model Created droping temp due to VIF value > 5
R2 = 0.795
Adj R2 = 0.788
As dropping the temp we are lossing 5% of the variance lets keep temp and finalize the 4th model to predict the test data
R2 = 0.821
- wethersit_light_snow, windspeed, weathersit_mist, season_spring have indirect dependency with the target variable "cnt"
- mnth_10, mnth_8 , workingday, mnth_3, mnth_6, weekday_6, mnth_4, season_winter, mnth_5, mnth_9, yr, temp have direct dependency with the target variable "cnt"
- mnth_10 0.049952
- mnth_8 0.051156
- workingday 0.053324
- mnth_3 0.062035
- mnth_6 0.063313
- weekday_6 0.065144
- mnth_4 0.070493
- season_spring 0.075924
- season_winter 0.083347
- weathersit_mist 0.083826
- mnth_5 0.088043
- mnth_9 0.112402
- windspeed 0.154429
- yr 0.235197
- weathersit_light_snow 0.296761
- temp 0.412069
- temp
- weathersit_light_snow
- yr
cnt = const * 0.169 + yr * 0.235 + workingday * 0.053 + temp * 0.412 + windspeed * -0.154 + season_spring * -0.076 + season_winter * 0.083 + mnth_3 * 0.062 + mnth_4 * 0.07 + mnth_5 * 0.088 + mnth_6 * 0.063 + mnth_8 * 0.051 + mnth_9 * 0.112 + mnth_10 * 0.05 + weekday_6 * 0.065 + weathersit_light_snow * -0.297 + weathersit_mist * -0.084
- pandas - 1.3.4
- numpy - 1.20.3
- matplotlib - 3.4.3
- seaborn - 0.11.2
- plotly - 5.8.0
Give credit here.
- This project was group case study for an online advance course.
- https://www.geeksforgeeks.org/
- https://seaborn.pydata.org/
- https://plotly.com/
- https://pandas.pydata.org/
- https://learn.upgrad.com/
Created by [@darshil2848] - feel free to contact me!