Amazon and Vine collaborated to create a paid subscription program for readers. To focus and refine marketing efforts for the program, we analyze all 114,00 book reviews to ascertain whether paid member reviews are positively biased compared to non-member reviews. To this end, we utilize Apache Spark and PySpark to initiate the extraction, transformation, and loading (ETL) process with cloud computing. Then we transfer the data into a PostgreSQL database with pgAdmin by creating and connecting to an Amazon Web Service's Relational Database Service (AWS RDS) instance.
- There were 5,012 Vine reviews;
- There were 109,297 non-Vine reviews.
- 2,031 Vine reviews were five stars;
- 49,967 non-Vine reviews were five stars.
- Approximately 40.52% of Vine reviews were five stars;
- Approximately 45.72% of non-Vine reviews were five stars.
Vine Reviews | Non-Vine Reviews | |
---|---|---|
Total Reviews | 5,012 | 109,297 |
Number of Five Stars | 2,031 | 49,967 |
Percentage of Five Stars | 40.52% | 45.72% |
Based on the calculations above, positivity bias from members of the Vine program is unlikely. The percentage of five-star Vine reviews was comparable to that of five-star non-Vine reviews. Additional analysis could determine the distribution of star ratings by calculating the percentages of Vine and non-Vine reviews at each star rating.
Data Source:
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Books_v1_00.tsv.gz
Software:
AWS RDS
Google Colaboratory Notebook
Apache Spark
PySpark
Python
PostgreSQL
pgAdmin
Hadoop
MapReduce
mrjob
Email: kate.wang00001@gmail.com
LinkedIn: katewang01