Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.
At AppsFlyer, we use Spark for many of our offline processing services. Spark Streaming joined our technology stack a few months ago for real-time work flows, reading directly from Kafka to provide value to our clients in near-real-time.
In this session we will code a real-time dashboard application based on Spark Streaming technology. You will learn how to collect events from Kafka, aggregate them by time window and present aggregated insights in a dashboard.
The following scheme presents what we are going to create in this session:
Read events
+------------+ +------------------------+
| | | |
| Kafka | -----------> | Spark Streaming Job |
| | | |
+------------+ +------------------------+
|
| Update
V +---------------+
+--------+------+ | |
| | Push! | Real-time |
| RethinkDB | ---------> | Dashboard |
| | | _.. |
+---------------+ | (../ \) |
+---------------+
- We'll read real-time events from Kafka queue.
- We'll aggregate events in Spark streaming job by some sliding window.
- We'll write aggregated events to RethinkDB.
- RethinkDB will push updates to the real-time dashboard.
- Real-time dashboard will present current status.
- Laptop with at least 4GB of RAM.
- Make sure Virtualization is enabled in BIOS.
- VirtualBox v5.x or greater installed.
- Vagrant v1.8.1 or greater installed.
Please do these steps prior to coming to the Meetup:
- Clone or download this repository.
- Run
vagrant up && vagrant halt
inside thevagrant/
directory.
Please be patient, this process may take some time...
-
Go inside
vagrant/
directory. -
Boot the Vagrant virtual machine:
vagrant up
-
Start the Spark Streaming process:
vagrant ssh -c /projects/aggregator/run.sh
-
Start the Dashboard application in another terminal window:
vagrant ssh -c /projects/dashboard/run.sh
If everything was successful, you should be able to access your Dashboard application here:
After the meetup, you may want to close running virtual machine. To do so, run inside vagrant/
directory:
vagrant suspend
To destroy the virtual machine completely, run:
vagrant destroy