ETL Yts Movies 📽️

Overview 📑

This ETL is a sample of my coding work with python using Apache Airflow for pipeline orchestration, Pandas for data manipulation, MySql as storage and Apache Superset for data visualization.

This process uses the YTS Movies API, a public API for developers, as data sources.
For more information about this API click here.

Pipeline workflow 👷🏽

Step-By-Step

The first step in this process is to create the database and tables.
The second step is to extract the data from the api and store it in MySql in the bronze database.
The third step is data refinement and storage in silver database in MySql.
The four step is to load the data and create the star schema model and store the data in the gold database in MySql.
The last step makes a data quality of data to check existing duplicated records or check if found records in the tables.

Techniques and Tools 🛠️

Incremental data load
ETL logs
Star Schema Model
Python ( Pandas , POO )
SQL
Data Quality
Apache Airflow
Apache Superset

Database Design ✏️

For this project, the star schema is used to design tables. In this way, it simplifies queries and provides fast data aggregation.

For process control two tables were created, etl_logging to save the process execution steps and etl_process to define the type of process.

Sample Data Visualization 📊

This is an example of a dashboard template for data visualization, build in apache superset.

Apache Airflow Pipeline

This is what the airflow dag looks like:

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
Airflow		Airflow
Docker		Docker
Docs/Screenshot		Docs/Screenshot
ETL		ETL
Script		Script
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
orchrestrator.py		orchrestrator.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Yts Movies 📽️

Overview 📑

Pipeline workflow 👷🏽

Step-By-Step

Techniques and Tools 🛠️

Database Design ✏️

Sample Data Visualization 📊

Apache Airflow Pipeline

About

Contributors 2

Languages

joaoallmeida/yts-movies-etl

Folders and files

Latest commit

History

Repository files navigation

ETL Yts Movies 📽️

Overview 📑

Pipeline workflow 👷🏽

Step-By-Step

Techniques and Tools 🛠️

Database Design ✏️

Sample Data Visualization 📊

Apache Airflow Pipeline

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages