Skip to content

This project employs unsupervised machine learning for cluster analysis and customer segmentation

Notifications You must be signed in to change notification settings

Mo-Khalifa96/Customer-Segmentation

Repository files navigation

Customer Segmentation (Unsupervised Machine Learning for Cluster Analysis)

About The Project

This project employs unsupervised machine learning for cluster analysis and customer segmentation. The dataset used here is comprised of thousands of records of customer purchases and shopping habits. The goal of this project is, first, to analyze and understand in depth the customer base present in the dataset, and, second, to utilize machine learning algorithms for cluster analysis in order to breakdown the customer base into distinct clusters customer groups. Cluster analysis can be very useful for understanding a customer base and aiding businesses to tailor targeted marketing strategies, optimize their product offerings, and be better able to meet their customers' needs and enhance their shopping experience. As such, after segmenting the customer base into separate groups, the groups will then be analyzed and compared to develop a thorough understanding of the different customer groups, their characteristics, preferences, shopping habits and their needs. This subsequently will guide efforts to curate targeted marketing campaigns, develop customer retention strategies, and/or enhance overall customer satisfaction and loyalty.
In order to perform cluster analysis to segment customers into different clusters, first the data was inspected, engineered, and processed in preparation for analysis and modeling. After preparing the data, four clustering algorithms were developed and evaluated in order to find the most suitable algorithm for task. Having obtained the best clustering model for the data, customers were segmented into groups and the resultant customer groups were analyzed in depth. A report was written describing the findings and identifying the main characteristics or unique features of each customer group, as well as describing the overall similarities and dissimilarities between the customer groups. On the basis of this report, a subsequent section was developed laying out the key insights and takeaways of the cluster analysis as well as providing recommendations to improve sales or curate better marketing campaigns tailored to each customer group separately.



Overall, this project is broken down into 7 sections:
   1. Reading and Inspecting the Data
   2. Updating the Data
   3. Exploratory Data Analysis
   4. Data Preprocessing
   5. Model Development and Evaluation (Cluster Analysis)
   6. Model Interpretation
   7. Key Insights and Recommendations



About The Data

The present dataset was taken from Kaggle.com, a popular platform for finding and publishing datasets. You can quickly access it by clicking here. The dataset consists of around 4,000 records of customer purchases. For each entry here, a customer is assigned a unique identifier and their purchase, preferences, and other relevant details are recorded. Indeed, the dataset encompasses a wide variety of variables, including demographic information about the customers, their shopping frequency and purchase history, their product preferences and overall satisfaction with the product purchased. This, therefore, makes the current dataset ideal analyzing and understanding consumer behavior, decision-making, and for the purposes of cluster analysis and customer segmentation.

You can view each column and its description in the table below:

Variable Description
Customer ID Unique identifier for each customer
Age Age of the customer
Gender Gender of the customer
Item Purchased Item or product purchased
Category Category of the item purchased (e.g., clothing, accessory, etc.)
Purchase Amount (USD) Amount spent (in USD) in a given transaction
Location Location from which a purchase was made
Size Size of the purchased item (if applicable)
Color Color of the purchased item or product
Season Season in which the item was purchased (e.g., winter, spring, etc.)
Review Rating Rating score given by a customer for the item purchased (on a 5-point rating scale)
Subscription Status Indicates whether or not a customer is subscribed to the brand or shop service
Shipping Type Method of delivery or shipping type (e.g., standard shipping, express, store pickup, etc.)
Discount Applied Indicates whether or not a discount was applied to the purchase
Promo Code Used Indicates whether or or not a promo code or coupon was used during purchase
Previous Purchases Number of prior purchases made by the same customer
Payment Method Method of payment for the purchase (e.g., cash, credit card, paypal, etc.)
Frequency of Purchases Frequency of engagement of a customer in purchasing activities (e.g., weekly, monthly, annually, etc.)


Here's a sample of the dataset being analyzed:

https://github.com/Mo-Khalifa96/Customer-Segmentation/blob/main/shopping%20customers%20screenshot.jpg



Quick Access

To quickly access the project, I provided two links, both of which will direct you to Jupyter Notebook with all the code and corresponding output rendered and organized into separate sections and sub-sections. Each section is provided with thorough explanations that guide the project development one step at a time. The first link, however, only allows you to view the project without interacting with its code. The second link allows you to both the view the code and also interact with it directly to reproduce the analysis results if you wish so. To execute the code, please make sure to run the first two cells first in order to install the Python modules necessary for the task and make them ready for use. To run any given block of code, simply select the cell and click on the 'Run' icon on the notebook toolbar.


To view the project only, click on the following link:
https://nbviewer.org/github/Mo-Khalifa96/Customer-Segmentation/blob/main/Customer%20Segmentation%20%28Unsupervised%20ML%20for%20Cluster%20Analysis%29.ipynb

Alternatively, to view the project and interact with its code, click on the following link:
https://mybinder.org/v2/gh/Mo-Khalifa96/Customer-Segmentation/main?labpath=Customer+Segmentation+%28Unsupervised+ML+for+Cluster+Analysis%29.ipynb



About

This project employs unsupervised machine learning for cluster analysis and customer segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published