Data-Engineering

This repository contains 6 passed projects of the Data Engineering Program.

Lesson	Project Overview
Data Modeling	Model user activity data for a music streaming app called Sparkify and optimize queries for understanding what songs users are listening to. Project 1: Relational Model with Postgres - Desgin the schema and define Fact and Dimension tables; - Insert Data into the tables. Project 2: NoSQL data model with Apache Cassandra - Model the data to help the data team answer queries about the app usage; - Set up Apache Cassandra database tables in ways to optimize writes of transactional data on user sessions.
Cloud Data Warehouse	Project 3: Data Warehouse(AWS) - Build an ELT pipeline that extracts Sparkify’s data from S3, Amazon’s popular storage system; - Stage the data in Amazon Redshift and transform it into a set of fact and dimensional tables for the Sparkify analytics team to continue finding insights in what songs their users are listening to.
Data Lakes with Spark	Project 4: Data Lake with Apache Spark - Build an ETL pipeline for a data lake (The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in the app); - Load data from S3, process the data into analytics tables using Spark, and load them back into S3; - Deploy this Spark process on a cluster using AWS.
Data Pipelines with Airflow	Use up-and-coming tool Apache Airflow, developed and open-sourced by Airbnb and the Apache Foundation to continue to work on Sparkify’s data infrastructure. Project 5: Data Pipeline with Airflow - Creatand automate a set of data pipelines; - Configure and schedule data pipelines with Airflow, setting dependencies, triggers, and quality checks as would in a production setting.
Capstone	Capstone Project - Define the scope of the project and the data will be working with; - Gather data from four different sourcces then transform, combine, and summarize it; - Create a clean database for others to analyze.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CapstoneProject		CapstoneProject
DataLake-Spark		DataLake-Spark
DataModeling-ApacheCassandra		DataModeling-ApacheCassandra
DataModeling-Postgres		DataModeling-Postgres
DataPipelines-Airflow		DataPipelines-Airflow
DataWarehouse-AmazonRedshift		DataWarehouse-AmazonRedshift
DataEngineerCertificate.png		DataEngineerCertificate.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering

About

Releases

Packages

Languages

Yuexi-Li/Data-Engineering

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages