Case Analysis using ML methods and algorithms to gain insight into customer reviews.
This case analysis is meant to require competency for out entire module on Machine Learning methods and algorithms we have discussed.
The Dataset is of product reviews on Amazon.com collected over a span of 18 years. There are three million relatively short (up to a paragraph long) product reviews in this dataset. The dataset contains three columns: a numerical review between 1 and 5, a short title and the description. Take a large sample (if you encounter processing time issues) or use the entire dataset for further analysis.
- We are expected to analyze by employing sound ML philosophy and techniques in order to gain insights from customer reviews. Specifically, we are asked to train a supervised learning classification model that predicts the numerical class as closely as possible.
- The other aspect of the case is to create numerical classification using polarity score to be created in an unsupervised learning way. Use both TextBlob and Vader algorithms to create polarity scores.
- We are then asked to perform clustering of the reviews and perform basic statistical analysis to test if clusters stratify the numerical classes.
- Lastly, find out the set of tokens that are collectively most deterministic of the numerical classes, i.e. the tokens that define class 1 etc.
A link to my Jupyter Notebook can be found here and the final report can be found here