Skip to content

Latest commit

 

History

History
87 lines (60 loc) · 3.67 KB

File metadata and controls

87 lines (60 loc) · 3.67 KB

Customers-Segmentation-Unsupervised-ML-project-in-Python

Problem Statement:

  • understand the target customers to plan a marketing strategy.
  • identify the most important shopping groups based on annual income, age, gender, and their shopping score.
  • identify the ideal number of groups and labeling them using the Elbow's method.

Project approach:

  • EDA (Exploratory Data Analysis)
  • Utitlize KMeans clustering algorithm
  • from the created clusters use Summary Statistics

EDA (Exploratory Data Analysis)

1. Univariate Analysis: to describe and summarize the distribution of a single variable

I) looking at distributions of each column

image

  1. Distribution of Age:
  • it's roughly distributed with a peak for people aged around 30-50 years.
  • There're also some slightly peaks around 20 and 40 years, indicating other age clusters
  1. Distribution of Annual Income (in $k):
  • The income is right skewed where few people has higher annual income.
  1. Distribution of spending score:
  • The distribution is relatively uniform, where most customers have a moderate spending habits.

I recommend: Marketing strategies could target the 30-35 age group, as they are the most common customers. Also, High-income individuals (80k+) might be valuable for premium services.



II) looking at distribution of each column Vs. Gender

image

  • we can see clearly that 'Females' portion is more frequent than that of 'Males'.

  • Also there is an outlier in the 'Male' as it has a right thick tail with the distribution skewed to the right.

    • So to investigate this outlier:

    image

    Interpretation:

    • The median income for females appears slightly lower than that for males.
    • There’s one outlier among males, indicating a significantly higher income.

2. Bivariate Analysis:

I) understanding customers behavior based on demographic factors like age and gender in relation to income and spending habits.

image

  • In 'Annual Income vs. Spending Score':
    • There may be a pattern found related to annual income and spending score
  • In 'Age vs. Spending Score':
    • No strong correlation wes found.
    • Younger individuals may have slightly higher spending scores.
  • In 'Age vs. Annual Income' :
    • No correlation was found
    • Income is different across all age groups

II) identify relationships between variables:

image

  • In 'Age vs. Spending Score' :
    • There is a strong negative correlation (-0.33) between age and spending score.
    • Younger individuals tend to have higher spending scores.
  • There's no clear correlation between 'Annual Income' and 'Spending Score'
  • There's almost no correlation between 'Age' and 'Annual Income'




Using K-means clustering ML algorithm, I found:

image

Trends & Patterns found:

  1. target cluster would be cluster 1 which has a high spending score and high annual income.
  2. approxiamtley 54% of females in cluster 1 shoppers are females. we should look for ways to attract more of them using a marketing campaign targeting popular items in this cluster.
  3. cluster 4 has a potential in sales. so we should make an event on popular items purchased by them.