Skip to content

Sentiment analysis on Amazon customer reviews using Spark

Notifications You must be signed in to change notification settings

swathi-shree/BigData_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Project-Group-09

This is Group 9. Our problem statement is to implement a big data problem on any target platform.

Contributors: Sahana Srinivasan and Swathi Shree

BIG DATA AND PROCESSING WITH MAPREDUCE

Amazon Customer Sentimental Analysis using Spark and Python on Google Colab

REQUIREMENTS: Spark 3.1.1, Hadoop 2.7

TARGET PLATFORM: Cloud Systems, implemented using Google Colab

LANGUAGE: Python

PRIMARY LIBRARY: PySpark

DATASET: Amazon's Jewelry Review dataset

Dataset can be downloaded from: https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Jewelry_v1_00.tsv.gz

Steps to running the program:

  1. Download the dataset from the above link and mount it onto a location on your Google Drive
  2. Set up Spark on Google Colab using the Java JVM and Python to set up Pyspark
  3. Start a Spark session
  4. To load dataset in notebook, replace the path for loaded_info = spark.read.csv('path of downloaded dataset',sep='\t', inferSchema=True, header=True)
  5. Gather and analyse columns as needed

About

Sentiment analysis on Amazon customer reviews using Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published