Skip to content

A data zoomcamp related to Data Engineering including Data Modeling, Infrastructure setup on google cloud, Data Warehousing and Data Lake development.

Notifications You must be signed in to change notification settings

teenbress/DataEngineeringZoomCamp

Repository files navigation

Data Engineering Zoom Camp

Overview

overview

Syllabus

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

  • Data Lake
  • Workflow orchestration
  • Workflow orchestration with Mage
  • Homework

More details

  • Reading from apis
  • Building scalable pipelines
  • Normalising data
  • Incremental loading
  • Homework

More details

  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • BigQuery Machine Learning

More details

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase

More details

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

More details

About

A data zoomcamp related to Data Engineering including Data Modeling, Infrastructure setup on google cloud, Data Warehousing and Data Lake development.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published