Use Airflow to establish an ETL pipeline.
Data source: Taichung actual price registration of real estate transaction in 2023.
- Extract raw data from Taichung opendata.
- Transform raw data
- Remove special transaction data to avoid influencing information interpretation.
- Add a new column to group the house ages every 10 years for easier data visualization in the future.
- Load processed data to database (PostgreSQL)
$ docker compose up -d
It requires to set up airflow connections for the data source and database.
Just run ./airflow_conn_init.sh
in any airflow docker container.