This module handles opiniated Dataflow job configuration and deployments.
The resources/services/activations/deletions that this module will create/trigger are:
- Create a GCS bucket for temporary job data
- Create a Dataflow job
This module is meant for use with Terraform 0.13. If you haven't upgraded and need a Terraform 0.12.x-compatible version of this module, the last released version intended for Terraform 0.12.x is v1.0.0.
Before using this module, one should get familiar with the google_dataflow_job
’s Notes on “destroy”/”apply” as the behavior is atypical when compared to other resources.
The module is made to be used with the template_gcs_path as the staging location. Hence, one assumption is that, before using this module, you already have working Dataflow job template(s) in GCS staging location(s).
There are examples included in the examples folder but simple usage is as follows:
module "dataflow-job" {
source = "terraform-google-modules/dataflow/google"
version = "0.1.0"
project_id = "<project_id>"
job_name = "<job_name>"
on_delete = "cancel"
zone = "us-central1-a"
max_workers = 1
template_gcs_path = "gs://<path-to-template>"
temp_gcs_location = "gs://<gcs_path_temp_data_bucket"
parameters = {
bar = "example string"
foo = 123
}
}
Then perform the following commands on the root folder:
terraform init
to get the pluginsterraform plan
to see the infrastructure planterraform apply
to apply the infrastructure buildterraform destroy
to destroy the built infrastructure
Name | Description | Type | Default | Required |
---|---|---|---|---|
ip_configuration | The configuration for VM IPs. Options are 'WORKER_IP_PUBLIC' or 'WORKER_IP_PRIVATE'. | string |
null |
no |
machine_type | The machine type to use for the job. | string |
"" |
no |
max_workers | The number of workers permitted to work on the job. More workers may improve processing speed at additional cost. | number |
1 |
no |
name | The name of the dataflow job | string |
n/a | yes |
network_self_link | The network self link to which VMs will be assigned. | string |
"default" |
no |
on_delete | One of drain or cancel. Specifies behavior of deletion during terraform destroy. The default is cancel. | string |
"cancel" |
no |
parameters | Key/Value pairs to be passed to the Dataflow job (as used in the template). | map(string) |
{} |
no |
project_id | The project in which the resource belongs. If it is not provided, the provider project is used. | string |
n/a | yes |
region | The region in which the created job should run. Also determines the location of the staging bucket if created. | string |
"us-central1" |
no |
service_account_email | The Service Account email that will be used to identify the VMs in which the jobs are running | string |
"" |
no |
subnetwork_self_link | The subnetwork self link to which VMs will be assigned. | string |
"" |
no |
temp_gcs_location | A writeable location on GCS for the Dataflow job to dump its temporary data. | string |
n/a | yes |
template_gcs_path | The GCS path to the Dataflow job template. | string |
n/a | yes |
zone | The zone in which the created job should run. | string |
"us-central1-a" |
no |
Name | Description |
---|---|
id | The unique Id of the newly created Dataflow job |
name | The name of the dataflow job |
state | The state of the newly created Dataflow job |
temp_gcs_location | The GCS path for the Dataflow job's temporary data. |
template_gcs_path | The GCS path to the Dataflow job template. |
Before this module can be used on a project, you must ensure that the following pre-requisites are fulfilled:
- Terraform is installed on the machine where Terraform is executed.
- The Service Account you execute the module with has the right permissions.
- The necessary APIs are active on the project.
- A working Dataflow template in uploaded in a GCS bucket
The project factory can be used to provision projects with the correct APIs active.
- Terraform >= 0.13.0
- terraform-provider-google plugin v2.18.0
In order to execute this module you must have a Service Account with the following project roles:
- roles/dataflow.admin
- roles/iam.serviceAccountUser
- roles/storage.admin
If you want to use the service_account_email input to specify a service account that will identify the VMs in which the jobs are running, the service account will need the following project roles:
- roles/dataflow.worker
- roles/storage.objectAdmin
In order to launch a Dataflow Job, the Dataflow API must be enabled:
- Dataflow API -
dataflow.googleapis.com
- Compute Engine API:
compute.googleapis.com
Be sure you have the correct Terraform version (0.12.x), you can choose the binary here:
- bundler
- gcloud
- terraform-docs 0.6.0
Run
make generate_docs
Integration tests are run though test-kitchen, kitchen-terraform, and InSpec.
test-kitchen
instances are defined in .kitchen.yml
. The test-kitchen instances in test/fixtures/
wrap identically-named examples in the examples/
directory.
- Configure the test fixtures
- Download a Service Account key with the necessary permissions and put it in the module's root directory with the name
credentials.json
. - Build the Docker container for testing:
make docker_build_kitchen_terraform
- Run the testing container in interactive mode:
make docker_run
The module root directory will be loaded into the Docker container at /cft/workdir/
.
5. Run kitchen-terraform to test the infrastructure:
kitchen create
creates Terraform state and downloads modules, if applicable.kitchen converge
creates the underlying resources. Runkitchen converge <INSTANCE_NAME>
to create resources for a specific test case.kitchen verify
tests the created infrastructure. Runkitchen verify <INSTANCE_NAME>
to run a specific test case.kitchen destroy
tears down the underlying resources created bykitchen converge
. Runkitchen destroy <INSTANCE_NAME>
to tear down resources for a specific test case.
Alternatively, you can simply run make test_integration_docker
to run all the test steps non-interactively.
Each test-kitchen instance is configured with a variables.tfvars
file in the test fixture directory. For convenience, since all of the variables are project-specific, these files have been symlinked to test/fixtures/shared/terraform.tfvars
.
Similarly, each test fixture has a variables.tf
to define these variables, and an outputs.tf
to facilitate providing necessary information for inspec
to locate and query against created resources.
Each test-kitchen instance creates necessary fixtures to house resources.
Run
make generate_docs
The makefile in this project will lint or sometimes just format any shell, Python, golang, Terraform, or Dockerfiles. The linters will only be run if the makefile finds files with the appropriate file extension.
All of the linter checks are in the default make target, so you just have to run
make -s
The -s is for 'silent'. Successful output looks like this
Running shellcheck
Running flake8
Running go fmt and go vet
Running terraform validate
Running hadolint on Dockerfiles
Checking for required files
Testing the validity of the header check
..
----------------------------------------------------------------------
Ran 2 tests in 0.026s
OK
Checking file headers
The following lines have trailing whitespace
The linters are as follows:
- Shell - shellcheck. Can be found in homebrew
- Python - flake8. Can be installed with 'pip install flake8'
- Golang - gofmt. gofmt comes with the standard golang installation. golang is a compiled language so there is no standard linter.
- Terraform - terraform has a built-in linter in the 'terraform validate' command.
- Dockerfiles - hadolint. Can be found in homebrew