Starbucks_case_study_Udacity_Data_Science

Case study from UDACITY Data Scientist Nanodegree

Context

This case study is part of UDACITY DATA SCIENTIST Nanodegree. It deals with a Starbucks promotion activity to increase Sales.The train dataset contains control and experiment groups (85.000 customer records). For each customer, a total of 7 unknown features are also provided.

Objective

The task is to analyse whether the promotion activity is a success. 2 metrics will be evaluated:

Increase in purchase rate (IRR)
Value of incremental revenues net of promotion costs (NIR) - This metric is particularly important to judge the practical significance of the experiment.

Detailed instructions are provided in the notebook.

Datasets

2 datasets are provided. A train set to perform our hypothesis testing and a test set to measure our promotion strategy performance vs a benchmark.

Project structure

We first perform hypothesis testing on the invariant metric and then on the 2 evaluation metrics.

Our analysis shows that, while the promotion has a statistical effect on the purchase rate (vs control group), it does not generate extra revenues: the incremental sales are marginal and more than compensated by the promotion costs. As such the promotion should not be repeated.

We then train a Classification model using the provided features to predict whether the customer will purchase the promotion or not. Using this model, we fine-tune our promotion strategy to target this customer segment (based on classification model prediction). Using this strategy, we end up outperforming the benchmark using a random forest model on the unseen test set.

Confusion Matrix from Random Forest Model on 20% of the train set reserved for validation:

The ultimate objective of the promotion strategy is to :

reduce false negatives so that to send the promotion to all customers willing to buy the promotion
reduce false positive so that to minimize promotion costs

Special attention is given during the modeling phase to the highly imbalanced dataset (very few purchase class). Augmentation techniques are applied, several models are tested including imblearn models specifically designed to handle such situation. Model performances are recorded in the notebook for comparision purpose.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
provided_information		provided_information
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Starbucks.ipynb		Starbucks.ipynb
Test.csv		Test.csv
test_results.py		test_results.py
training.csv		training.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks_case_study_Udacity_Data_Science

Context

Objective

Datasets

Project structure

About

Releases

Packages

Languages

License

LaurentVeyssier/Starbucks_case_study_Udacity_Data_Science

Folders and files

Latest commit

History

Repository files navigation

Starbucks_case_study_Udacity_Data_Science

Context

Objective

Datasets

Project structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages