Skip to content

The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. We had access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets, video game. We used PySpark to perform the ETL process to ex…

Notifications You must be signed in to change notification settings

utsavchaudharygithub/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis:

The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. We had access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets, video game. We used PySpark to perform the ETL process to extract the dataset, transformed the data, connected to an AWS RDS instance, and loaded the transformed data into pgAdmin. Next, we used Pandas to determine if there is any bias toward favorable reviews from Vine members in your dataset. We summarized of the analysis for Jennifer to submit to the SellBy stakeholders.

Resources used:

Data source: Amazon review dataset click for link vinereview dataset click for link Request access for colab press here

Results:

-Total Vine number is 94. -Total 5 stars vine number is 48. -Percentages of 5 stars reviews is ~51.6% Screen Shot 2022-02-28 at 9 55 28 PM

-Total no-Vine number is 40471. -Total 5 stars non paid vine number is 15663. -Percentages of 5 stars no-vine reviews is ~38.7% Screen Shot 2022-02-28 at 9 55 36 PM

-Total number of vines is 40565 Screen Shot 2022-02-28 at 9 55 43 PM

summary:

We used Pandas to determine if there is any bias towards reviews that were written as part of the Vine program. For this analysis, we determined if having a paid Vine review makes a difference in the percentage of 5-star reviews.We can see that there is significant difference between vine and no vine reviews which are 51% and 39%, which shows that vine members are bias. We could have more statistical analysis like mean, med, mode to come up with better result.

About

The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. We had access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We picked one of these datasets, video game. We used PySpark to perform the ETL process to ex…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published