Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
 into fix/remove-remaining-airflowdb-references
  • Loading branch information
dervoeti committed Sep 29, 2023
2 parents e87c8d7 + 7a3fb47 commit 37bf87d
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 14 deletions.
14 changes: 9 additions & 5 deletions docs/modules/airflow/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ WARNING: Do not use this setup in production! Supported databases and versions a

There are 2 ways to run Stackable Operators

1. Using xref:stackablectl::index.adoc[]
1. Using xref:management:stackablectl:index.adoc[]

2. Using Helm

=== stackablectl

stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
Follow the xref:stackablectl::installation.adoc[installation steps] for your platform.
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.

After you have installed stackablectl run the following command to install all Operators necessary for Airflow:

Expand All @@ -49,7 +49,8 @@ The tool will show
[INFO ] Installing airflow operator
----

TIP: Consult the xref:stackablectl::quickstart.adoc[] to learn more about how to use stackablectl. For example, you can use the `-k` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use stackablectl. For
example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].

=== Helm

Expand All @@ -65,8 +66,11 @@ Then install the Stackable Operators:
include::example$getting_started/code/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.
Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the
CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.

== What's next

xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example DAG.
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and
xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example
DAG.
23 changes: 18 additions & 5 deletions docs/modules/airflow/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,18 @@
:description: The Stackable Operator for Apache Airflow is a Kubernetes operator that can manage Apache Airflow clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Airflow versions.
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, engineer, big data, metadata, job pipeline, scheduler, workflow, ETL

:k8s-crs: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/

The Stackable Operator for Apache Airflow manages https://airflow.apache.org/[Apache Airflow] instances on Kubernetes.
Apache Airflow is an open-source application for creating, scheduling, and monitoring workflows. Workflows are defined as code, with tasks that can be run on a variety of platforms, including Hadoop, Spark, and Kubernetes itself. Airflow is a popular choice to orchestrate ETL workflows and data pipelines.
Apache Airflow is an open-source application for creating, scheduling, and monitoring workflows. Workflows are defined
as code, with tasks that can be run on a variety of platforms, including Hadoop, Spark, and Kubernetes itself. Airflow
is a popular choice to orchestrate ETL workflows and data pipelines.

== Getting started

Get started using Airflow with the Stackable Operator by following the xref:getting_started/index.adoc[] guide. It guides you through installing the Operator alongside a PostgreSQL database and Redis instance, connecting to your Airflow instance and running your first workflow.
Get started using Airflow with the Stackable Operator by following the xref:getting_started/index.adoc[] guide. It
guides you through installing the Operator alongside a PostgreSQL database and Redis instance, connecting to your
Airflow instance and running your first workflow.

=== Custom resources

Expand Down Expand Up @@ -58,7 +64,11 @@ image::airflow_overview.drawio.svg[A diagram depicting the Kubernetes resources

The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other.

For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the main container running Airflow and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The Operator creates a Service per role group as well as a single service for the whole `webserver` role called `<clustername>-webserver`.
For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a
StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the
main container running Airflow and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The
Operator creates a Service per role group as well as a single service for the whole `webserver` role called
`<clustername>-webserver`.

Additionally, a ConfigMap is created for each RoleGroup. These ConfigMaps contain two files, `log_config.py` and `webserver_config.py`, which contain logging and general Airflow configuration respectively.

Expand All @@ -70,11 +80,14 @@ NOTE: Redis is only needed if the executors have been set to `spec.celeryExecuto

== Using custom workflows/DAGs

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Direct acyclic graphs (DAGs) of tasks] are the core entities you will use in Airflow. Have a look at the page on xref:usage-guide/mounting-dags.adoc[] to learn about the different ways of loading your custom DAGs into Airflow.
https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Direct acyclic graphs (DAGs) of tasks] are
the core entities you will use in Airflow. Have a look at the page on xref:usage-guide/mounting-dags.adoc[] to learn
about the different ways of loading your custom DAGs into Airflow.

== Demo

You can install the xref:stackablectl::demos/airflow-scheduled-job.adoc[] demo and explore an Airflow installation, as well as how it interacts with xref:spark-k8s:index.adoc[Apache Spark].
You can install the xref:demos:airflow-scheduled-job.adoc[] demo and explore an Airflow installation, as
well as how it interacts with xref:spark-k8s:index.adoc[Apache Spark].

== Supported Versions

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,6 @@ customConfig:
condition: >-
.pod == "airflow-scheduler-custom-log-config-0" &&
.container == "vector"
filteredAutomaticLogConfigInitDb:
type: filter
inputs: [vector]
condition: .container == "airflow-init-db"
filteredInvalidEvents:
type: filter
inputs: [vector]
Expand Down

0 comments on commit 37bf87d

Please sign in to comment.