This repository contains 6 passed projects of the Data Engineering Program.
Lesson | Project Overview |
---|---|
Data Modeling | Model user activity data for a music streaming app called Sparkify and optimize queries for understanding what songs users are listening to. Project 1: Relational Model with Postgres - Desgin the schema and define Fact and Dimension tables; - Model the data to help the data team answer queries about the app usage; |
Cloud Data Warehouse | Project 3: Data Warehouse(AWS) - Build an ELT pipeline that extracts Sparkify’s data from S3, Amazon’s popular storage system;
|
Data Lakes with Spark | Project 4: Data Lake with Apache Spark - Build an ETL pipeline for a data lake (The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app); |
Data Pipelines with Airflow | Use up-and-coming tool Apache Airflow, developed and open-sourced by Airbnb and the Apache Foundation to continue to work on Sparkify’s data infrastructure. Project 5: Data Pipeline with Airflow - Creatand automate a set of data pipelines; |
Capstone | Capstone Project - Define the scope of the project and the data will be working with; |