Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update references #327

Merged
merged 3 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions docs/modules/airflow/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ WARNING: Do not use this setup in production! Supported databases and versions a

There are 2 ways to run Stackable Operators

1. Using xref:stackablectl::index.adoc[]
1. Using xref:management:stackablectl:index.adoc[]

2. Using Helm

=== stackablectl

stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
Follow the xref:stackablectl::installation.adoc[installation steps] for your platform.
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.

After you have installed stackablectl run the following command to install all Operators necessary for Airflow:

Expand All @@ -49,7 +49,8 @@ The tool will show
[INFO ] Installing airflow operator
----

TIP: Consult the xref:stackablectl::quickstart.adoc[] to learn more about how to use stackablectl. For example, you can use the `-k` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use stackablectl. For
example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].

=== Helm

Expand All @@ -65,8 +66,11 @@ Then install the Stackable Operators:
include::example$getting_started/code/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.
Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the
CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.

== What's next

xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example DAG.
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and
xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example
DAG.
29 changes: 22 additions & 7 deletions docs/modules/airflow/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@
:description: The Stackable Operator for Apache Airflow is a Kubernetes operator that can manage Apache Airflow clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Airflow versions.
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, engineer, big data, metadata, job pipeline, scheduler, workflow, ETL

:k8s-crs: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
fhennig marked this conversation as resolved.
Show resolved Hide resolved

The Stackable Operator for Apache Airflow manages https://airflow.apache.org/[Apache Airflow] instances on Kubernetes.
Apache Airflow is an open-source application for creating, scheduling, and monitoring workflows. Workflows are defined as code, with tasks that can be run on a variety of platforms, including Hadoop, Spark, and Kubernetes itself. Airflow is a popular choice to orchestrate ETL workflows and data pipelines.
Apache Airflow is an open-source application for creating, scheduling, and monitoring workflows. Workflows are defined
as code, with tasks that can be run on a variety of platforms, including Hadoop, Spark, and Kubernetes itself. Airflow
is a popular choice to orchestrate ETL workflows and data pipelines.

== Getting started

Get started using Airflow with the Stackable Operator by following the xref:getting_started/index.adoc[] guide. It guides you through installing the Operator alongside a PostgreSQL database and Redis instance, connecting to your Airflow instance and running your first workflow.
Get started using Airflow with the Stackable Operator by following the xref:getting_started/index.adoc[] guide. It
guides you through installing the Operator alongside a PostgreSQL database and Redis instance, connecting to your
Airflow instance and running your first workflow.

== Resources

The Operator manages three https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/[custom resources]: The _AirflowCluster_ and _AirflowDB_. It creates a number of different Kubernetes resources based on the custom resources.
The Operator manages three {k8s-crs}[custom resources]: The _AirflowCluster_ and _AirflowDB_. It creates a number of
different Kubernetes resources based on the custom resources.

=== Custom resources

Expand Down Expand Up @@ -62,9 +69,14 @@ Based on the custom resources you define, the Operator creates ConfigMaps, State

image::airflow_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]

The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The Job created for the AirflowDB is not shown.
The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The
Job created for the AirflowDB is not shown.

For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the main container running Airflow and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The Operator creates a Service per role group as well as a single service for the whole `webserver` role called `<clustername>-webserver`.
For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a
StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the
main container running Airflow and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The
Operator creates a Service per role group as well as a single service for the whole `webserver` role called
`<clustername>-webserver`.

ConfigMaps are created, one per RoleGroup and also one for the AirflowDB. Both ConfigMaps contains two files: `log_config.py` and `webserver_config.py` which contain logging and general Airflow configuration respectively.

Expand All @@ -76,11 +88,14 @@ NOTE: Redis is only needed if the executors have been set to `spec.celeryExecuto

== Using custom workflows/DAGs

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Direct acyclic graphs (DAGs) of tasks] are the core entities you will use in Airflow. Have a look at the page on xref:usage-guide/mounting-dags.adoc[] to learn about the different ways of loading your custom DAGs into Airflow.
https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html[Direct acyclic graphs (DAGs) of tasks] are
the core entities you will use in Airflow. Have a look at the page on xref:usage-guide/mounting-dags.adoc[] to learn
about the different ways of loading your custom DAGs into Airflow.

== Demo

You can install the xref:stackablectl::demos/airflow-scheduled-job.adoc[] demo and explore an Airflow installation, as well as how it interacts with xref:spark-k8s:index.adoc[Apache Spark].
You can install the xref:demos:airflow-scheduled-job.adoc[] demo and explore an Airflow installation, as
well as how it interacts with xref:spark-k8s:index.adoc[Apache Spark].

== Supported Versions

Expand Down