demo-kafka-spark-pipeline

Description

This is a dockerized demo setup containing:

a Kafka setup
- mock data loader - loading R4 FHIR resources from mock-data-kdb.ndjson into the Kafka topic "fhir.post-gateway-kdb"
- GUI AkHQ on http://localhost:8082)
a SPARK setup
- master on http://localhost:8083/
a pathling container built from a Dockerfile (where the pathling python API is installed and important pyspark submit args are defined)

In order to start the containers with kafka and mock-data-loader + pathling container including jupyter lab, run the following command:

# if not executable, first run "chmod +x start.sh"
./start.sh

This script runs the kafka_stream_con.py script inside the container:

starts the SparkSession
reads the Kafka topic into Spark - prints out a key-value table with the R4 FHIR resources inside.

# if not executable, first run "chmod +x stop.sh"
./stop.sh

In order to use the jupyter lab, just run the following command and click on the URL to open Jupyter in a browser:

docker logs -f jupyter-pathling

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
spark		spark
test-data		test-data
volume		volume
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
start.sh		start.sh
stop.sh		stop.sh