Rossman Pharmaceuticals has multiple stores across several cities and the finance team wants to forecast sales in all these stores across several cities six weeks ahead of time. The data team identified factors such as promotions, competition, school and state holidays, seasonality, and locality as necessary for predicting the sales across the various stores. The objective here is to use the data provided to build and serve an end-to-end product that delivers the prediction to analysts in the finance team.
Different techniques were used in the project to train and serve the prediction, this was done to enable us to choose the best one. The techniques used are:
- Linear regression
- Random Forest
- Deep Learning (Long Shot-Term Memory - LSTM)
It is a linear approach to modelling and mapping the relationship between one variable, which is usually a target(y) and another variable or variables that are usually the features that determine the target. If regression analysis is known on a data that has just a single feature, it is said to be a univariate analysis, if it involves a dataset with multiple features, it is termed 'multivariate analysis'. The dataset used in this project has many features (as shown above) and so the analysis carried out is a multivariate analysis.
Random forests or random decision forests, according to Wikipedia, are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For regression tasks, the mean or average prediction of the individual trees is returned. This algorithm was also used to make predictions.
Long short-term memory is an artificial recurrent neural network architecture used in the field of deep learning. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video). LSTM networks are well suited for a classification task, processing and making predictions based on time series. Since LSTM is good for time series, we isolated the pharmaceutical data into time series data and created a deep learning model (LSTM) that is suitable for predictions.
A streamlit app was created to serve the model. The app allows users to upload the data as a csv file and predict - the deep learning model is running under the hood. Here is the link https://github.com/SamDewriter/deployed_app to the repository that contains the code of the app.