Unsupervised learning techniques applied to see if any similarities appears between customers, and how to best segment customers into distinct categories.
In this project, we applied unsupervised learning techniques on product spending data collected for customers of a wholesale distributor to identify customer segments hidden in the data. We first investigated the data by selecting a small portion of it to sample and identify if there is a correlation between product categories. Next, we preprocessed the data by measuring each product category and then identifying outliers to remove them. We then applied PCA transform and enforce clustering algorithms to segment the transformed customer data. Finally, we matched the cluster segmentation meet with an extra labeling and conceived ways this information might assist the distributor in the future.
This project contains 3 files:
customer_segments.ipynb: The main file where our contribution for the project.
customers.csv: The project dataset.
visuals.py: This Python file includes helper functions to create the required visualizations.
This project uses the following software and Python libraries:
NumPy
pandas
scikit-learn (v0.17)
matplotlib
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 3.x installer and not the Python 2.7 installer.
Yo can view the results once you click on the file "customer_segments.ipynb" and if you wat to contribute as wellgo to Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook customer_segments.ipynb to open up a browser window or tab to work with your notebook.