- Developed a content based recommender that recommends restaurants to the users.
- Extracted, pre-processed, and cleaned the data related to restaurants from Yelp academic dataset.
- Implemented mapreduce design patterns like filtering, summarization, data organization, and join patterns to perform analysis such as top restaurants by country and state, total restaurants by country and state, moving average rating of restaurants, top restaurants by positive reviews, minimum and maximum review count of each restaurants, etc.
- Performed sentiment analysis of the reviews about the restaurants given by Yelp users.
- Calculated the pearson correlation, jaccard correlation and cosine correlation between restaurants to recommend to users.
- Performed bining to split the data source on the basis of a preset value of a column and bloom filtering to filter the restaurants on basis of cities they are located in.
- Deployed the project on AWS EC2 with 4 instances comprising of a namenode, a secondary namenode and two data nodes to achieve high scalability and performance.
- Visualized the analysis in PowerBI.
- Average rating and total restaurants by cuisine
- Content based recommendation
- Elite users based on useful votes
- Minimun maximum total review count
- Restaurants by star
- Restaurant search using bloom filtering
- Sentiment analysis of user reviews
- Sentiment analysis of user reviews by restaurants
- Simple moving average rating of restaurants
- Tip at restaurants
- Top 10 restaurants by positive reviews
- Top restaurants by state
- Total and average rating of restaurants by country
- Total restaurants by state
Java, R
Hadoop, HDFS, MapReduce, AWS EC2, Ubuntu
Eclipse, RStudio, WinSCP, Putty, PuttyGen, PowerBI