Reddit Sentiment Analysis of the War in Palestine

Overview

This project performs sentiment analysis on Reddit posts related to the war in Palestine. It leverages a modern data pipeline involving data ingestion, message brokering, stream processing, machine learning for sentiment prediction, and dashboard visualization. The entire setup is containerized using Docker and managed with Docker Compose.

Architecture

Components

Reddit API: Source of data, fetching posts related to the war in Palestine.
Ingestion Script: Python script to fetch data from Reddit API.
Kafka: Message broker to handle streaming data.
Apache Spark: For stream processing.
Fine-tuned BERT Model: Machine learning model to analyze sentiment.
Cassandra: Database to store processed data.
Grafana: Dashboard to visualize data.
Kafdrop: Kafka monitoring tool.
Docker: Containerization of services.
FastAPI: API service to expose the prediction model.

Setup Instructions

Prerequisites

Docker
Docker Compose

How to Run

Clone the Repository:

git clone https://github.com/yourusername/reddit-sentiment-analysis.git
cd reddit-sentiment-analysis

Build and Start Services:
```
docker-compose up --build
```
Accessing Services:
- Kafka Monitoring: http://localhost:9000
- Spark Master: http://localhost:8080
- Model Service: http://localhost:8081
- Grafana Dashboard: http://localhost:3000

Components Details

Reddit Producer

Purpose: Fetches data from Reddit and sends it to Kafka.
Technology: Python, Kafka
Key Files:
- reddit-producer.py: Main script to fetch and send data.
- config.yaml: Configuration file for Reddit API and Kafka.

Spark Stream Processing

Purpose: Processes streaming data from Kafka.
Technology: Apache Spark
Key Files:
- spark-streaming.py: Spark job to process and analyze data.

Model Service

Purpose: Provides an API to predict sentiment using a fine-tuned BERT model.
Technology: FastAPI, PyTorch
Key Files:
- app.py: FastAPI application.
- Model files in model/ directory.

Grafana

Purpose: Visualizes the processed data.
Technology: Grafana
Key Files:
- grafana.ini: Configuration file for Grafana.
- cassandra.yaml: Datasource configuration for Cassandra.

Monitoring

Kafdrop: Monitor Kafka topics and brokers at http://localhost:9000
Grafana: Visualize data and monitor metrics at http://localhost:3000

Feel free to contribute by opening issues or submitting pull requests. For major changes, please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cassandra		cassandra
grafana		grafana
model-service		model-service
reddit-producer		reddit-producer
spark		spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
system_design.gif		system_design.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Sentiment Analysis of the War in Palestine

Overview

Architecture

Components

Setup Instructions

Prerequisites

How to Run

Components Details

Reddit Producer

Spark Stream Processing

Model Service

Grafana

Monitoring

About

Releases

Packages

Languages

License

yassineiscoding/reddit-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Reddit Sentiment Analysis of the War in Palestine

Overview

Architecture

Components

Setup Instructions

Prerequisites

How to Run

Components Details

Reddit Producer

Spark Stream Processing

Model Service

Grafana

Monitoring

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages