Secured Data Warehouse Blueprint

This repository contains Terraform configuration modules that allow Google Cloud customers to quickly deploy a secured BigQuery data warehouse, following the Secure a BigQuery data warehouse that stores confidential data guide. The blueprint allows customers to use Google Cloud's core strengths in data analytics, and to overcome typical challenges that include:

Limited knowledge/experience with best practices for creating, deploying, and operating in Google Cloud.
Security/risk concerns and restrictions from their internal security, risk, and compliance teams.
Regulatory and compliance approval from external auditors.

The Terraform configurations in this repository provide customers with an opinionated architecture that incorporates and documents best practices for a performant and scalable design, combined with security by default for control, logging and evidence generation. It can be simply deployed by customers through a Terraform workflow.

Disclaimer

When using this blueprint, it is important to understand how you manage separation of duties. We recommend you remove all primitive owner roles in the projects used as inputs for the Data Warehouse module. The secured data warehouse itself does not need any primitive owner roles for correct operations.

When using this blueprint in the example mode or when using this blueprint to create the new projects with default configurations for the deployment, we automatically remove the owner role as it has too broad access.

However, if you choose to use this blueprint with pre-existing projects in your organization, we will not proactively remove any pre-existing owner role assignments, as we won’t know your intent for or dependency on these role assignments in your pre-existing workloads. The pre-existing presence of these roles does expand the attack and risk surface of the resulting deployment. Therefore, we highly recommend you review your use of owner roles in these pre-existing cases and see if you can eliminate them to improve your resulting security posture. Only you can determine the appropriate trade-off to meet your business requirements.

You can check the current situation of your project with either of the following methods:

Using Security Health Analytics (SHA), checking the KMS vulnerability findings, for the Detector KMS_PROJECT_HAS_OWNER.
- You can search for the SHA findings with category KMS_PROJECT_HAS_OWNER in the Security Command Center in the Google Cloud Console.
You can also use Cloud Asset Inventory search-all-iam-policies gcloud command doing a Query by role to search for owner of the project.

See the terraform-example-foundation for additional good practices.

Usage

Basic usage of this module is as follows:

module "secured_data_warehouse" {
  source  = "terraform-google-modules/secured-data-warehouse/google"
  version = "~> 0.1"

  org_id                           = ORG_ID
  data_governance_project_id       = DATA_GOVERNANCE_PROJECT_ID
  confidential_data_project_id     = CONFIDENTIAL_DATA_PROJECT_ID
  non_confidential_data_project_id = NON_CONFIDENTIAL_DATA_PROJECT_ID
  data_ingestion_project_id        = DATA_INGESTION_PROJECT_ID
  sdx_project_number               = EXTERNAL_TEMPLATE_PROJECT_NUMBER
  terraform_service_account        = TERRAFORM_SERVICE_ACCOUNT
  access_context_manager_policy_id = ACCESS_CONTEXT_MANAGER_POLICY_ID
  bucket_name                      = DATA_INGESTION_BUCKET_NAME
  pubsub_resource_location         = PUBSUB_RESOURCE_LOCATION
  location                         = LOCATION
  trusted_locations                = TRUSTED_LOCATIONS
  dataset_id                       = DATASET_ID
  confidential_dataset_id          = CONFIDENTIAL_DATASET_ID
  cmek_keyring_name                = CMEK_KEYRING_NAME
  perimeter_additional_members     = PERIMETER_ADDITIONAL_MEMBERS
  data_engineer_group              = DATA_ENGINEER_GROUP
  data_analyst_group               = DATA_ANALYST_GROUP
  security_analyst_group           = SECURITY_ANALYST_GROUP
  network_administrator_group      = NETWORK_ADMINISTRATOR_GROUP
  security_administrator_group     = SECURITY_ADMINISTRATOR_GROUP
  delete_contents_on_destroy       = false
}

Note: There are three inputs related to GCP Locations in the module:

pubsub_resource_location: is used to define which GCP location will be used to Restrict Pub/Sub resource locations. This policy offers a way to ensure that messages published to a topic are never persisted outside of a Google Cloud regions you specify, regardless of where the publish requests originate. Zones or multi-region locations are not supported.
location: is used to define which GCP region will be used for all other resources created: Cloud Storage buckets, BigQuery datasets, and Cloud KMS key rings. Multi-region locations are supported.
trusted_locations: is a list of locations that are used to set an Organization Policy that restricts the GCP locations that can be used in the projects of the Secured Data Warehouse. Both pubsub_resource_location and location must respect this restriction.

Inputs

Name	Description	Type	Default	Required
access_context_manager_policy_id	The id of the default Access Context Manager policy. Can be obtained by running `gcloud access-context-manager policies list --organization YOUR-ORGANIZATION_ID --format="value(name)"`.	`string`	`""`	no
additional_restricted_services	The list of additional Google services to be protected by the VPC-SC service perimeters.	`list(string)`	`[]`	no
bucket_class	The storage class for the bucket being provisioned.	`string`	`"STANDARD"`	no
bucket_lifecycle_rules	List of lifecycle rules to configure. Format is the same as described in provider documentation https://www.terraform.io/docs/providers/google/r/storage_bucket.html#lifecycle_rule except condition.matches_storage_class should be a comma delimited string.	set(object({ action = any condition = any }))	[ { "action": { "type": "Delete" }, "condition": { "age": 30, "matches_storage_class": [ "STANDARD" ], "with_state": "ANY" } } ]	no
bucket_name	The name of the bucket being provisioned.	`string`	n/a	yes
cmek_keyring_name	The Keyring prefix name for the KMS Customer Managed Encryption Keys being provisioned.	`string`	n/a	yes
confidential_data_dataflow_deployer_identities	List of members in the standard GCP form: user:{email}, serviceAccount:{email} that will deploy Dataflow jobs in the Confidential Data project. These identities will be added to the VPC-SC secure data exchange egress rules.	`list(string)`	`[]`	no
confidential_data_egress_policies	A list of all egress policies for the Confidential Data perimeter, each list object has a `from` and `to` value that describes egress_from and egress_to. See also secure data exchange and the VPC-SC module.	list(object({ from = any to = any }))	`[]`	no
confidential_data_perimeter	Existing confidential data perimeter to be used instead of the auto-created perimeter. The service account provided in the variable `terraform_service_account` must be in an access level member list for this perimeter before this perimeter can be used in this module.	`string`	`""`	no
confidential_data_project_id	Project where the confidential datasets and tables are created.	`string`	n/a	yes
confidential_dataset_id	Unique ID for the confidential dataset being provisioned.	`string`	`"secured_dataset"`	no
data_analyst_group	Google Cloud IAM group that analyzes the data in the warehouse.	`string`	n/a	yes
data_engineer_group	Google Cloud IAM group that sets up and maintains the data pipeline and warehouse.	`string`	n/a	yes
data_governance_egress_policies	A list of all egress policies for the Data Governance perimeter, each list object has a `from` and `to` value that describes egress_from and egress_to. See also secure data exchange and the VPC-SC module.	list(object({ from = any to = any }))	`[]`	no
data_governance_perimeter	Existing data governance perimeter to be used instead of the auto-created perimeter. The service account provided in the variable `terraform_service_account` must be in an access level member list for this perimeter before this perimeter can be used in this module.	`string`	`""`	no
data_governance_project_id	The ID of the project in which the data governance resources will be created.	`string`	n/a	yes
data_ingestion_dataflow_deployer_identities	List of members in the standard GCP form: user:{email}, serviceAccount:{email} that will deploy Dataflow jobs in the Data Ingestion project. These identities will be added to the VPC-SC secure data exchange egress rules.	`list(string)`	`[]`	no
data_ingestion_egress_policies	A list of all egress policies for the Data Ingestion perimeter, each list object has a `from` and `to` value that describes egress_from and egress_to. See also secure data exchange and the VPC-SC module.	list(object({ from = any to = any }))	`[]`	no
data_ingestion_perimeter	Existing data ingestion perimeter to be used instead of the auto-created perimeter. The service account provided in the variable `terraform_service_account` must be in an access level member list for this perimeter before this perimeter can be used in this module.	`string`	`""`	no
data_ingestion_project_id	The ID of the project in which the data ingestion resources will be created	`string`	n/a	yes
dataset_default_table_expiration_ms	TTL of tables using the dataset in MS. The default value is null.	`number`	`null`	no
dataset_description	Dataset description.	`string`	`"Data-ingestion dataset"`	no
dataset_id	Unique ID for the dataset being provisioned.	`string`	n/a	yes
dataset_name	Friendly name for the dataset being provisioned.	`string`	`"Data-ingestion dataset"`	no
delete_contents_on_destroy	(Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present.	`bool`	`false`	no
enable_bigquery_read_roles_in_data_ingestion	(Optional) If set to true, it will grant to the dataflow controller service account created in the data ingestion project the necessary roles to read from a bigquery table.	`bool`	`false`	no
key_rotation_period_seconds	Rotation period for keys. The default value is 30 days.	`string`	`"2592000s"`	no
kms_key_protection_level	The protection level to use when creating a key. Possible values: ["SOFTWARE", "HSM"]	`string`	`"HSM"`	no
labels	(Optional) Labels attached to Data Warehouse resources.	`map(string)`	`{}`	no
location	The location for the KMS Customer Managed Encryption Keys, Cloud Storage Buckets, and Bigquery datasets. This location can be a multi-region.	`string`	`"us-east4"`	no
network_administrator_group	Google Cloud IAM group that reviews network configuration. Typically, this includes members of the networking team.	`string`	n/a	yes
non_confidential_data_project_id	The ID of the project in which the Bigquery will be created.	`string`	n/a	yes
org_id	GCP Organization ID.	`string`	n/a	yes
perimeter_additional_members	The list additional members to be added on perimeter access. Prefix user: (user:email@email.com) or serviceAccount: (serviceAccount:my-service-account@email.com) is required.	`list(string)`	`[]`	no
pubsub_resource_location	The location in which the messages published to Pub/Sub will be persisted. This location cannot be a multi-region.	`string`	`"us-east4"`	no
remove_owner_role	(Optional) If set to true, remove all owner roles in all projects in case it has been found in some project.	`bool`	`false`	no
sdx_project_number	The Project Number to configure Secure data exchange with egress rule for dataflow templates. Required if using a dataflow job template from a private storage bucket outside of the perimeter.	`string`	`""`	no
security_administrator_group	Google Cloud IAM group that administers security configurations in the organization(org policies, KMS, VPC service perimeter).	`string`	n/a	yes
security_analyst_group	Google Cloud IAM group that monitors and responds to security incidents.	`string`	n/a	yes
terraform_service_account	The email address of the service account that will run the Terraform code.	`string`	n/a	yes
trusted_locations	This is a list of trusted regions where location-based GCP resources can be created.	`list(string)`	[ "us-locations" ]	no
trusted_subnetworks	The URI of the subnetworks where resources are going to be deployed.	`list(string)`	`[]`	no

Outputs

Name	Description
blueprint_type	Type of blueprint this module represents.
cmek_bigquery_crypto_key	The Customer Managed Crypto Key for the BigQuery service.
cmek_confidential_bigquery_crypto_key	The Customer Managed Crypto Key for the confidential BigQuery service.
cmek_data_ingestion_crypto_key	The Customer Managed Crypto Key for the data ingestion crypto boundary.
cmek_reidentification_crypto_key	The Customer Managed Crypto Key for the Confidential crypto boundary.
confidential_access_level_name	Access context manager access level name.
confidential_data_dataflow_bucket_name	The name of the bucket created for dataflow in the confidential data pipeline.
confidential_dataflow_controller_service_account_email	The confidential Dataflow controller service account email. See https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#specifying_a_user-managed_controller_service_account.
confidential_service_perimeter_name	Access context manager service perimeter name.
data_governance_access_level_name	Access context manager access level name.
data_governance_service_perimeter_name	Access context manager service perimeter name.
data_ingestion_access_level_name	Access context manager access level name.
data_ingestion_bigquery_dataset	The bigquery dataset created for data ingestion pipeline.
data_ingestion_bucket_name	The name of the bucket created for the data ingestion pipeline.
data_ingestion_dataflow_bucket_name	The name of the bucket created for dataflow in the data ingestion pipeline.
data_ingestion_service_perimeter_name	Access context manager service perimeter name.
data_ingestion_topic_name	The topic created for data ingestion pipeline.
dataflow_controller_service_account_email	The Dataflow controller service account email. See https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#specifying_a_user-managed_controller_service_account.
pubsub_writer_service_account_email	The PubSub writer service account email. Should be used to write data to the PubSub topics the data ingestion pipeline reads from.
scheduler_service_account_email	The Cloud Scheduler service account email, no roles granted.
storage_writer_service_account_email	The Storage writer service account email. Should be used to write data to the buckets the data ingestion pipeline reads from.

Requirements

These sections describe requirements for using this module.

Note: Please see the Disclaimer regarding project owners before creating projects.

Software

Install the following dependencies:

Google Cloud SDK version 357.0.0 or later
Terraform version 0.13.7 or later
Terraform Provider for GCP version 3.77 or later
Terraform Provider for GCP Beta version 3.77 or later

Security Groups

Provide the following groups for separation of duty. Each group is granted roles to perform their tasks. Then, add users to the appropriate groups as needed.

Data Engineer group: Google Cloud IAM group that sets up and maintains the data pipeline and warehouse.
Data Analyst group: Google Cloud IAM group that analyzes the data in the warehouse.
Security Analyst group: Google Cloud IAM group that monitors and responds to security incidents.
Network Administrator group: Google Cloud IAM group that reviews network configuration. Typically, this includes members of the networking team.
Security Administrator group: Google Cloud IAM group that administers security configurations in the organization(org policies, KMS, VPC service perimeter).

Groups can be created in the Google Workspace Admin Console, in the Google Cloud Console, and using gcloud identity groups create.

Service Account

To provision the resources of this module, create a privileged service account, where the service account key cannot be created. In addition, consider using Cloud Monitoring to alert on this service account's activity. Grant the following roles to the service account.

Organization level
- Access Context Manager Admin: roles/accesscontextmanager.policyAdmin
- Organization Policy Administrator: roles/orgpolicy.policyAdmin
- Organization Administrator: roles/resourcemanager.organizationAdmin
Project level:
- Data ingestion project
  - App Engine Creator:roles/appengine.appCreator
  - Cloud Scheduler Admin:roles/cloudscheduler.admin
  - Compute Network Admin:roles/compute.networkAdmin
  - Compute Security Admin:roles/compute.securityAdmin
  - Dataflow Developer:roles/dataflow.developer
  - DNS Administrator:roles/dns.admin
  - Project IAM Admin:roles/resourcemanager.projectIamAdmin
  - Pub/Sub Admin:roles/pubsub.admin
  - Service Account Admin:roles/iam.serviceAccountAdmin
  - Service Account Token Creator:roles/iam.serviceAccountTokenCreator
  - Service Usage Admin: roles/serviceusage.serviceUsageAdmin
  - Storage Admin:roles/storage.admin
- Data governance project
  - Cloud KMS Admin:roles/cloudkms.admin
  - Cloud KMS CryptoKey Encrypter:roles/cloudkms.cryptoKeyEncrypter
  - DLP De-identify Templates Editor:roles/dlp.deidentifyTemplatesEditor
  - DLP Inspect Templates Editor:roles/dlp.inspectTemplatesEditor
  - DLP User:roles/dlp.user
  - Data Catalog Admin:roles/datacatalog.admin
  - Project IAM Admin:roles/resourcemanager.projectIamAdmin
  - Secret Manager Admin: roles/secretmanager.admin
  - Service Account Admin:roles/iam.serviceAccountAdmin
  - Service Account Token Creator:roles/iam.serviceAccountTokenCreator
  - Service Usage Admin: roles/serviceusage.serviceUsageAdmin
  - Storage Admin:roles/storage.admin
- Non Confidential project
  - BigQuery Admin:roles/bigquery.admin
  - Project IAM Admin:roles/resourcemanager.projectIamAdmin
  - Service Account Admin:roles/iam.serviceAccountAdmin
  - Service Account Token Creator:roles/iam.serviceAccountTokenCreator
  - Service Usage Admin: roles/serviceusage.serviceUsageAdmin
  - Storage Admin:roles/storage.admin
- Confidential project
  - BigQuery Admin:roles/bigquery.admin
  - Compute Network Admin:roles/compute.networkAdmin
  - Compute Security Admin:roles/compute.securityAdmin
  - DNS Administrator:roles/dns.admin
  - Dataflow Developer:roles/dataflow.developer
  - Project IAM Admin:roles/resourcemanager.projectIamAdmin
  - Service Account Admin:roles/iam.serviceAccountAdmin
  - Service Account Token Creator:roles/iam.serviceAccountTokenCreator
  - Service Usage Admin: roles/serviceusage.serviceUsageAdmin
  - Storage Admin:roles/storage.admin

You can use the Project Factory module and the IAM module in combination to provision a service account with the necessary roles applied.

The user using this service account must have the necessary roles to impersonate the service account.

APIs

Create four projects with the following APIs enabled to host the resources of this module:

Data ingestion project

Access Context Manager API: accesscontextmanager.googleapis.com
App Engine Admin API:appengine.googleapis.com
Artifact Registry API:artifactregistry.googleapis.com
BigQuery API:bigquery.googleapis.com
Cloud Billing API:cloudbilling.googleapis.com
Cloud Build API:cloudbuild.googleapis.com
Cloud Key Management Service (KMS) API:cloudkms.googleapis.com
Cloud Resource Manager API:cloudresourcemanager.googleapis.com
Cloud Scheduler API:cloudscheduler.googleapis.com
Compute Engine API:compute.googleapis.com
Google Cloud Data Catalog API:datacatalog.googleapis.com
Dataflow API:dataflow.googleapis.com
Cloud Data Loss Prevention (DLP) API:dlp.googleapis.com
Cloud DNS API:dns.googleapis.com
Identity and Access Management (IAM) API:iam.googleapis.com
Cloud Pub/Sub API:pubsub.googleapis.com
Service Usage API:serviceusage.googleapis.com
Google Cloud Storage JSON API:storage-api.googleapis.com

Data governance project

Access Context Manager API: accesscontextmanager.googleapis.com
Cloud Billing API:cloudbilling.googleapis.com
Cloud Key Management Service (KMS) API:cloudkms.googleapis.com
Cloud Resource Manager API:cloudresourcemanager.googleapis.com
Google Cloud Data Catalog API:datacatalog.googleapis.com
Cloud Data Loss Prevention (DLP) API:dlp.googleapis.com
Identity and Access Management (IAM) API:iam.googleapis.com
Service Usage API:serviceusage.googleapis.com
Google Cloud Storage JSON API:storage-api.googleapis.com
Secrect Manager API: secretmanager.googleapis.com

Non-confidential data project

Access Context Manager API: accesscontextmanager.googleapis.com
BigQuery API:bigquery.googleapis.com
Cloud Billing API:cloudbilling.googleapis.com
Cloud Key Management Service (KMS) API:cloudkms.googleapis.com
Cloud Resource Manager API:cloudresourcemanager.googleapis.com
Identity and Access Management (IAM) API:iam.googleapis.com
Service Usage API:serviceusage.googleapis.com
Google Cloud Storage JSON API:storage-api.googleapis.com

Confidential data project

Access Context Manager API: accesscontextmanager.googleapis.com
Artifact Registry API:artifactregistry.googleapis.com
BigQuery API:bigquery.googleapis.com
Cloud Billing API:cloudbilling.googleapis.com
Cloud Build API:cloudbuild.googleapis.com
Cloud Key Management Service (KMS) API:cloudkms.googleapis.com
Cloud Resource Manager API:cloudresourcemanager.googleapis.com
Compute Engine API:compute.googleapis.com
Google Cloud Data Catalog API:datacatalog.googleapis.com
Dataflow API:dataflow.googleapis.com
Cloud Data Loss Prevention (DLP) API:dlp.googleapis.com
Cloud DNS API:dns.googleapis.com
Identity and Access Management (IAM) API:iam.googleapis.com
Service Usage API:serviceusage.googleapis.com
Google Cloud Storage JSON API:storage-api.googleapis.com

The following APIs must be enabled in the project where the service account was created

Access Context Manager API: accesscontextmanager.googleapis.com
App Engine Admin API: appengine.googleapis.com
Cloud Billing API:cloudbilling.googleapis.com
Cloud Key Management Service (KMS) API:cloudkms.googleapis.com
Cloud Pub/Sub API: pubsub.googleapis.com
Cloud Resource Manager API:cloudresourcemanager.googleapis.com
Compute Engine API:compute.googleapis.com
Dataflow API:dataflow.googleapis.com
Identity and Access Management (IAM) API:iam.googleapis.com

You can use the Project Factory module to provision the projects with the necessary APIs enabled.

Security Disclosures

Please see our security disclosure process.

Contributing

Refer to the contribution guidelines for information on contributing to this module.

This is not an officially supported Google product

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github		.github
build		build
docs		docs
examples		examples
flex-templates		flex-templates
helpers		helpers
modules		modules
policy-library		policy-library
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
ERRATA.md		ERRATA.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
iam.tf		iam.tf
kitchen.yml		kitchen.yml
main.tf		main.tf
outputs.tf		outputs.tf
service_control.tf		service_control.tf
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Secured Data Warehouse Blueprint

Disclaimer

Usage

Inputs

Outputs

Requirements

Software

Security Groups

Service Account

APIs

Data ingestion project

Data governance project

Non-confidential data project

Confidential data project

The following APIs must be enabled in the project where the service account was created

Security Disclosures

Contributing

About

Releases

Packages

Languages

License

erlanderlo/terraform-google-secured-data-warehouse-1

Folders and files

Latest commit

History

Repository files navigation

Secured Data Warehouse Blueprint

Disclaimer

Usage

Inputs

Outputs

Requirements

Software

Security Groups

Service Account

APIs

Data ingestion project

Data governance project

Non-confidential data project

Confidential data project

The following APIs must be enabled in the project where the service account was created

Security Disclosures

Contributing

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages