Big Data Project: MRR Vaccination Data Analysis

Overview

This project aims to analyze MRR (Measles, Rubella, and Rubella) vaccination data from various US states. We’ll use both streaming and batch processing techniques. The data will be stored in Hadoop, processed using Spark, and saved in Cassandra tables. Grafana will visualize the results.

Project Structure:

.
├── cassandra-setup.cql
├── config.sh
├── Data
│   ├── all-measles-rates.csv
│   └── shuffle.py
├── docker-compose.yml
├── grafana
│   ├── dashboards
│   ├── grafana.ini
│   └── provisioning
├── scripts
│  ├── cassandra-tables.sh
│  ├── grafana-config.sh
│  ├── hadoop-job.sh
│  ├── kafka-producer.sh
│  └── spark-job.sh
├── README.md

Workflow

Docker Compose Setup: docker-compose up -d --build
Copy JAR files into the Hadoop master container: docker cp hadoop-master:/ ( You can use the build.sh script to automate this step from the Vaccination-Rate-Hadoop and Vaccination-Rate-Spark repositories)
MRR vaccination Data Shuffling
Copy the shuffled data into the Hadoop master.
Streaming Job
- Create a Kafka topic for streaming.
- Stream the data into Kafka.
- Process the data using a Spark job.
- Save the processed results in a Cassandra table.
Batch Job
- Load the data into HDFS (Hadoop Distributed File System).
- Run a Hadoop job to process the batch data.
- Save the results in another Cassandra table.
Visualization
- Grafana retrieves data from both Cassandra tables using a plugin.
- Grafana creates real-time dashboards to visualize insights from the streaming and batch jobs.

Run config.sh to configure and start the jobs.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Data		Data
Vaccination-Rate-Hadoop @ 95d483c		Vaccination-Rate-Hadoop @ 95d483c
Vaccination-Rate-Spark @ cbda7b0		Vaccination-Rate-Spark @ cbda7b0
grafana		grafana
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
cassandra-setup.cql		cassandra-setup.cql
config.sh		config.sh
docker-compose.yml		docker-compose.yml
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Project: MRR Vaccination Data Analysis

Overview

Workflow

Streaming Job

Batch Job

Visualization

About

Releases

Packages

Languages

BigData-GL4/Big-Data-Project

Folders and files

Latest commit

History

Repository files navigation

Big Data Project: MRR Vaccination Data Analysis

Overview

Workflow

Streaming Job

Batch Job

Visualization

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages