Skip to content

This repository provides a demo of my walk-through about running an Apache Beam Pipeline on Azure Databricks.

License

Notifications You must be signed in to change notification settings

Stefn93/BeamOnDatabricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BeamOnDatabricks

This repository provides a demo of my walk-through on how to run an Apache Beam Pipeline on Azure Databricks.

  1. Import Maven Project
  2. Setup project's JDK to 1.8
  3. Create fat-jar using maven lifecycle
  4. Attach the shaded jar to a Databricks Job
    a. Select Runtime version 6.4
    b. Add --runner=SparkRunner --usesProvidedSparkContext to Job's parameters
    c. Configure other Job parameters as you prefer
  5. Run your Databricks Job!

N.B. The same method should work on any other Spark Cluster too.

About

This repository provides a demo of my walk-through about running an Apache Beam Pipeline on Azure Databricks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages