Skip to content

LargeWaffle/twitter-sentiment-analysis

Repository files navigation

Tweet sentiment analysis

Objectives

For a Big Data course, we had to handle big datasets while also dealing with the NLP problematic.

Our goals were :

  • Checking PySpark scalability with large datasets
  • Observe the benefits of data distribution on our processes
  • Ensure satisfactory sentiment prediction results

Datasets

  • Sentiment140
  • Custom dataset fetched from Twitter public API

Tools used

PySpark - Spark Streaming - Kafka