Skip to content

Latest commit

 

History

History
209 lines (144 loc) · 7.14 KB

README.md

File metadata and controls

209 lines (144 loc) · 7.14 KB

Project logo

Airflow Made Easy | Local Setup Using Docker

Execute Airflow Unit Tests

Deploy GitHub Pages

This is my Apache Airflow Local development setup using docker-compose. It will also include some sample DAGs and workflows.

Recent Updates:

03-Dec-2023

  • Upgrade to airflow 2.7.3
  • Upgraded superset to add secret key
  • Added superset database connection image
  • Works on M1 Mac

03-May-2022

  • Added Dockerfile to extend airflow image
  • Adding additional Pypi package (td-client)
  • Upgrade to Airflow 2.3.0

29-Jun-2021

  • Updated image to Airflow 2.1.1
  • Leveraging _PIP_ADDITIONAL_REQUIREMENTS to install additional dependencies
  • Developing and testing operators for Treasure Data
  • Read more at Treasure Data

📝 Table of Contents

🧐 About

Setup Apache Airflow 2.0 locally on Windows 10 (WSL2) via Docker Compose. The oiginal docker-compose.yaml file was taken from the official github repo.

This contains service definitions for

  • airflow-scheduler
  • airflow-webserver
  • airflow-worker
  • airflow-init - To initialize db and create user
  • flower
  • redis
  • postgres - This is backend for airflow. I am also creating additional database userdata as a backend for my data flow. This is not recommended. Its ideal to have separate databases for airflow and your data.

I have added additional command to add a airflow db connection as part of the docker-compose

Directories I am mounting:

  • ./dags
  • ./logs
  • ./plugins
  • ./sql - for Sql files. We can leveraje jinja templating in our queries. Refer the sample Dag.
  • ./test - Has Unit tests for Airflow Dags.
  • ./pg-init-scripts - This has scripts to create additional database in postgres.

Data Engineering Projects

Here you will find some personal projects that I have worked on. These projects will throw light on some of the airflow features I have used and learnings related to other technologies.

Data Visualization

To experiment with Apache Superset. Read more here

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Clone this repo to your machine

docker-compose -f docker-compose.yaml up airflow-init
docker-compose -f docker-compose.yaml up

Prerequisites

What things you need to install the software and how to install them.

You should have Docker and Docker-compose v1.27.0 or more installed on your machine

  • Install and configure WSL2
  • I also had to reset my Ubuntu installation and thats when it asked me to create a user.

Installing

A step by step series of examples that tell you how to get a development env running.

Clone the Repo

git clone

Start docker build

#To extend airflow image
docker-compose build

docker-compose -f docker-compose.yaml up airflow-init

docker-compose -f docker-compose.yaml up

Keep checking docker processes to make sure all machines are helthy

docker ps

Once you notice that all containers are healthy.

Add a connection to Postgres via command line and then Access Airflow UI

docker exec -it airflow-docker_airflow-worker airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'
http://localhost:8080

End with an example of getting some data out of the system or using it for a little demo.

🔧 Running the tests

Unit test for airflow dags has been defined and present in the test folder. This folder is also mapped to the docker containers inside the docker-compose.yaml file. Follow below steps to execute unittests after the docker containers are running:

./airflow bash
python -m unittest discover -v

Github Workflow for running tests

I had to create another docker-compose to be able to execute unit tests whenever I push code to master. Please refer

Break down into end to end tests

Another #TODO

🎈 Usage

Now you can create new dags and place them in your local system and can see it coming live on web UI. Refer the sample dag in the repo.

Important :

Edit the postgres_default connection from the UI or through command line if you want to persist data in postgres as part of the dags you create. Even better you can always add a new connection.

Update: This is now taken care of the in the updated Docker compose file. The connection and the new database are created
./airflow.sh bash

airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'

connect to postgres and create new database with name 'userdata'

docker exec -it airflowdocker_postgres_1 /bin/bash psql -U airflow create database userdata;


Turn on Dag: PostgreOperatorTest_Dag

⛏️ Built Using

✍️ Authors

🎉 Acknowledgements

  • Apache Airflow
  • Inspiration is the Airflow Community

Cleanup

docker-compose down --volumes --rmi all