Skip to content

Don't Panic. This guide will help you when it feels like the end of the world.

License

Notifications You must be signed in to change notification settings

newfront/hitchhikers_guide_to_deltalake_streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Hitchhiker's Guide to Delta Lake Streaming

Data+AI Summit Session

DON'T PANIC

This is a collection (growing hopefully as time goes on) providing tips and tricks to ensure your experience building and maintaining Streaming Delta Lake applications (and the tables that power them) is absolutely joyful, even when the shit hits the fan :). Remember, when all else fails to take a deep breath, count to 5, and dig into the following content.

Using the Guide

Take a look at the outline provided in /hitchhikers_guide/README.md to learn how to use the Guide in your Delta Lake streaming adventures.

Getting up and Running

Note: For x86_64 (linux/amd64) or (linux/arm64) use the common docker-compose.yaml Docker Image: This is using the newfrontdocker/delta-docker:3.0.0 image. This will be replaced with the official delta-docker image after delta-io/delta-docs#60 is merged and the image is pushed.

cd hitchhikers_guide && docker compose up

Note: The docker-compose.yaml includes settings to clamp the amount of local resources available for data processing (upper limit: 4 cpu cores, 16GB ram). If you want to increase the number of cores to reflect your laptop,desktop,server then remember to save at least 1 cpu core and at least 1GB ram for the root OS.

deploy:
    resources:
        limits:
            cpus: '4'
            memory: 16G
        reservations:
            cpus: '1'
            memory: 4G

Then head into the Jupyter Lab environment and start exploring.

The main content will be located in http://127.0.0.1:8888/lab/tree/hitchhikers_guide

Datasets

  • ECommerce Data - the dataset used in the hitchhikers guide is a subset of the data from the original 5gb dataset.

Note: The full datasets are not included in this repo. If you'd like to explore large datasets like the ecommerce behavior dataset, then please download it from Kaggle.

About

Don't Panic. This guide will help you when it feels like the end of the world.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published