-
Notifications
You must be signed in to change notification settings - Fork 50
Home
Kamil Breguła edited this page May 8, 2019
·
1 revision
This document outlines Google’s and Polidea’s design decision on the implementation of Oozie-to-Airflow converter.
In the first three Phases of the project (all foreseen so far) we assume the following workflow of conversion by the users:
- Users have existing standalone Oozie workflows and want to run similar workflows with Google Cloud Composer only in it’s latest released beta version.
- Current version support: Composer 1.5.0 and Airflow 1.10.1. We do not support standalone Airflow installation. We can provide generic guidelines of the versions supported and authentication configuration required to make standalone Airflow works with generated DAGs but without guarantee
- Google Cloud Composer has google connection id configured to be able to communicate with Google Dataproc components (Spark, Hive, Pig, HDFS ..)
- Service account of composer is configured to be able to access Dataproc components and gcloud command is available in the path when Bash Operator is executed. This is important initially to simplify authentication mechanisms in case we need to fall-back to gcloud commands whenever necessary.
- Default Connection Id for Composer is set with the same (or equivalent) service account of composer so that Dataproc operators can authenticate using this connection id in case Dataproc operators are used.
- User can configure names of Dataproc Cluster(s) to use as parameters of their migration - mapping of the configuration URLs/Names from Oozie to corresponding Dataproc Clusters.
- The migration workflow is a “one-off” operation that might require manual corrections of the generated DAG. There is no expectation that original Oozie workflow and generated Airflow DAG will be maintained in parallel. If the original workflow is updated, and migration performed again, it is likely that some manual modifications might have to be applied again. In the future versions such parallel maintenance and full automation of migration might be implemented.