Predicting-future-sales

A Russian software firm - 1C Company, has provided a time series data set consisting of daily sales data. We plan to create machine learning models to help predict subsequent month data for a particular product at a particular shop. This model would help the company know how much inventory they will need for next month.

Methodology

For Russian Software Company-1c, we are given the task of projecting total sales for each product category and store in the coming month. We started with Exploratory Data Analysis which helped us to do the initial investigations on data and it will also help us discover patterns, to spot anomalies and to test hypotheses. After performing EDA, we used different models to identify which model gives the best accuracy for predicting the sales. We first used XGBoost since it is very good at identifying patterns in data. We also used SARIMA and ARIMA which are statistical analysis models for forecasting future trends. We also used Prophet which is a procedure for forecasting time series data based on an additive model where nonlinear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. We used Python for coding, and the essential libraries will be Pandas, Numpy, Sklearn, and Data visualization packages like Seaborn and Matplotlib.

Implementation

Kaggle Dataset

Dataset consists of daily historical sales data of 1C Company. Data set provided includes sales information across 60 shops, 214200 items over a period of 34 months

Data Preparation and Cleaning:

Step 1: Check whether data is cross sectional, time series or transactional According to our data analysis, our data is transactional data since we have a time series variable and there are other variables as well. A data is called transactional data when the target variable is dependent on time and there are other variables on which the predicted value is not dependent on.

Step 2: If the data is transactional then convert it into time series data For this we took an aggregate on the item count column to find sum of items sold on each day

Step 3: Time series is stationary or not

Statistical modeling methods assume or require the time series to be stationary. For a time series to be stationary, the mean should be constant and so should the variance and there should be no seasonality in data To check whether our time series data is stationary or not we used the Augmented Dickey Fuller Test.

Null Hypothesis : time series is non-stationary Alternate Hypothesis: time series is stationary. It does not have time-dependent structure

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ARIMA, SARIMA and FBProphet.ipynb		ARIMA, SARIMA and FBProphet.ipynb
README.md		README.md
Stats Project.ipynb		Stats Project.ipynb
XGBoost.ipynb		XGBoost.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-future-sales

Methodology

Implementation

Data Preparation and Cleaning:

Forecasting Future Sales

XGBoost

ARIMA

SARIMAX

FBPROPHET

Future Scope

About

Releases

Packages

Languages

jaregauri/Predicting-future-sales

Folders and files

Latest commit

History

Repository files navigation

Predicting-future-sales

Methodology

Implementation

Data Preparation and Cleaning:

Forecasting Future Sales

XGBoost

ARIMA

SARIMAX

FBPROPHET

Future Scope

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages