Portofolio of projects aimed to demonstrate general usage of Airflow run locally with various integrations relevant to the current technologies market.
- dags
- Usage: Contains Directed Acyclic Graphs (DAGs) which define the workflow of tasks.
- data
- Usage: Directory for storing datasets and other data files used in the projects.
- deprecated_scripts
- Usage: Contains various scripts for data processing, automation, and other utilities that were previously used in various dags or setups of scripts. May contain old information from Alexey Grigorovich's classes.
- spark-images
- Usage: Directory containing the spark dockerfile currently used in conjunction with airflow.
- .env
- Usage: Mock file used to store environment variables that would normally not be stored on a file accessible to code/repository viewers for security and privacy reasons.
- .gitignore
- Usage: File used for git instructions to declutter the repository from image usage artefacts.
- config_local.py
- Usage: File used for PgAdmin 4 configuration. Used in docker-compose.yaml for various customization and replication of objects for quick setup/recovery.
- docker-compose.yaml
- Usage: File used by Docker to quickly build up custom containers and instructions for their interaction/networking capabilities.
- Dockerfile
- Usage: File used by Docker to build an image based on the instructions within. Contains customized Airflow installation with multiple community connectors added for functionality extension. Uses Airflow 2.9.0 as a base image from docker hub repository
- Makefile
- Usage: Make file used to create custom commands for easier setup/deconstruction of containers/images. Handy given some limitations
- Full Airflow setup including initializer, scheduler, triggerer, worker and server. (default user)
- Airflow support software such as a Redis database and a Postgresql database
- Independent Postgresql database for sandboxing purposes
- PgAdmin dependent on the Postgresql database mentioned above
- (Optional) Spark image that can be built locally and enabled to connect with the Airflow system
Prerequisites: Docker, Make (optional)
Getting the docker image:
- Build from the Dockerfile by downloading the Dockerfile as well as docker-compose.yaml locally and running
docker compose build
ormake build-nc
Pull it from the image provided here
Running the docker compose file:
docker compose up
is always a good option but for this particular situationmake compose-up
is also a good alternative
- Due to the dynamic nature of my contexts sometimes I need to be able to jump between developing locally and developing on a more portable Raspberry Pi 4. Personal laptop repairs are quite pricey it seems. Thus, software used in these images is based on ARM64 architecture and thus the nature of the repository is somewhat chaotic the further down the tree one goes.
- The repository also contains a custom spark docker image for easier deployment of applications.
- Technically this repository is part of a larger structure of folders that uses the software here as a part of a machine. The entire ecosystem contains an nginx server for reverse proxying and networking study, synapse server for streaming/communication study, kafka for apache projects exploration and various sql/nosql databases for studying.
- For simplicity I generally reserve address xxx.168.0.38 for the device running this