Using Spark & Kafka in Databricks

About

These notes were put together to help my fellow Kubrick Consultants master the syntax of PySpark & Kafka, the Python packages we use to compute in Databricks.

Unfortunately, I was laid off before I had the opportunity to finish this repository as a company resource. Instead, I've opted to share a scrubbed version publicly, under the MIT license, to display my command of Spark and Kafka as tools.

Layout & Use:

This project is broken up into two sub-folders, one for each package we used with Databricks, Spark & Kafka.

Kafka Practice Streams
Spark Practice Exercises

Within each folder, there are a set of techniques on display across a number of excercises. (Data Ingestion for Spark and Data Streaming for Kafka).

Setup & Installation:

Run these notebooks in Databricks (preferred method)

In order to run the Spark dependencies, you'll need an underlying cluster compute protocol. Those are a bit of a pain to get running, so I'm using Jupyter notebooks for now. Such notebooks allowed me to keep detailed notes during the learning process. Fortunately, Databicks allows you to upload Jupyter notebooks using the GUI, and allows you to write and export code in a ipynb format. To test the functionality of the code in the notebooks, you can start a databricks 2-core cluster with a free trial and drop these notebooks into your enviornment with the GUI.

These exercises were adaptations of former client problems solved by senior staff at Kubrick Group. They shared their solutions with us in a classroom context, asking us to cross gaps in efficiency, solve similar problems to common requests they encountered, and generally get comofortable with building and using dataframes in a cluster compute context.

Documentation

Apache Spark Documentation

Apache's website also has documentation available here: https://spark.apache.org/docs/latest/

Installation and update notes can be found here:

https://spark.apache.org/docs/latest/api/python/getting_started/install.html

Kafka: The Definitive Guide

Download the Kafka ebook from Confluent.io (a cloud-hosted Kafka application) https://www.confluent.io/resources/kafka-the-definitive-guide/

Databricks Academy https://partner-academy.databricks.com/learn

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Kafka Practice Streams		Kafka Practice Streams
Spark Practice Excercises		Spark Practice Excercises
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Spark & Kafka in Databricks

About

Layout & Use:

Setup & Installation:

Documentation

About

Releases

Packages

Languages

License

tgsaman/Spark-Kafka-Databricks

Folders and files

Latest commit

History

Repository files navigation

Using Spark & Kafka in Databricks

About

Layout & Use:

Setup & Installation:

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages