Version 0.1.0

Nuro-ai · Jun 7, 2023 · 5368868 · 5368868
commit 5368868
Show file tree

Hide file tree

Showing 43 changed files with 1,262 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,10 @@
+
+# Change Log
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/)
+and this project adheres to [Semantic Versioning](http://semver.org/).
+
+## [0.1.0] - 2023-06-07 
+### Added
+- Initial release of clarity_view_importer.py and BigQuery view definitions
diff --git a/LICENSE.MD b/LICENSE.MD
@@ -0,0 +1,13 @@
+Copyright 2023 Nuro
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/README.md b/README.md
@@ -0,0 +1,34 @@
+# CLARITY
+
+The **CL**oud **A**sset **R**epositry and **I**nventory **T**ool for **Y**ou (CLARITY) is a tool to extend Google Cloud Platform's [Cloud Asset Inventory](https://cloud.google.com/asset-inventory/docs/overview) (CAI) for better usability and functionality. It consists of two components:
+
+- **BigQuery Data Views** - a set of helpful views that make it easier to retrieve useful information from the 250+ tables created by the CAI data export process.
+- **CLARITY GUI** - a web interface for querying and pivoting among common data types.
+
+## Getting Started
+
+### Enable GCP Cloud Asset Inventory BigQuery Export
+Follow [this guide](https://cloud.google.com/asset-inventory/docs/exporting-to-bigquery) to export asset metadata for your GCP environment to BigQuery table. You may want to start with a single snapshot to get started and then move to a regularly scheduled snapshot using a Google Cloud Function, such as what is described in this [Medium.com]( https://medium.com/google-cloud/using-gcp-cloud-asset-inventory-export-to-keep-track-of-your-gcp-resources-over-time-20fb6fa63c68) guide. 
+
+### Import the CLARITY Views
+These 37 views make it easier to access common data types from your CAI BigQuery tables and are needed for the CLARITY web interface to function. 
+
+Run the latest version of `clarity_view_importer.py` from this repository, specifying the GCP project and BigQuery dataset for your existing CAI data. By default, the views will be created within your CAI dataset but this can be overridden with the `-P` and `-D` options at runtime.
+
+````
+# Create views in the cai_project project and cai_dataset dataset
+./clarity_view_importer.py -p cai_project -d cai_dataset
+
+Adding cai_project.cai_dataset.view_nodepool
+Adding cai_project.cai_dataset.view_k8s_rbac_authorization_cluster_role
+Adding cai_project.cai_dataset.view_bucket
+Adding cai_project.cai_dataset.view_k8s_deployments
+...
+````
+
+### Install the CLARITY Web Interface
+Once the CLARITY Web Interface is available, instructions will be included here.
+
+## License
+
+The software is provided by [Nuro](https://nuro.ai) under the Apache Software License agreement. 
diff --git a/clarity_view_importer/README.md b/clarity_view_importer/README.md
@@ -0,0 +1,63 @@
+# CAI View Importer
+Script to import BigQuery custom view definitions to facilitate streamlined querying of Google Cloud Asset Inventory data. These views must be created for the CLARITY web interface to function.
+
+## Requirements: 
+- Google Cloud Asset Inventory data [exported to BigQuery](https://cloud.google.com/asset-inventory/docs/exporting-to-bigquery)
+- Python BigQuery module
+- GCP account with `roles/bigquery.jobUser` and `roles/bigquery.dataEditor` predefined IAM roles to create the views.
+
+## Usage:
+
+````
+usage: clarity_view_importer.py [-h] -p PROJECT -d DATASET [-P VIEW_PROJECT] [-D VIEW_DATASET] [-v VIEW_DIRECTORY]
+
+Import BigQuery view definitions for Google Cloud Asset Inventory.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -p PROJECT, --project PROJECT
+                        GCP project name for the BigQuery Cloud Asset Inventory dataset is hosted
+  -d DATASET, --dataset DATASET
+                        BigQuery dataset name where the BigQuery Cloud Asset Inventory is hosted
+  -P VIEW_PROJECT, --view_project VIEW_PROJECT
+                        GCP project name where the view will be stored. Default is --project value.
+  -D VIEW_DATASET, --view_dataset VIEW_DATASET
+                        BigQuery dataset name where the view will be stored. Default is --dataset value.
+  -v VIEW_DIRECTORY, --view_directory VIEW_DIRECTORY
+                        Directory where view definitions are stored
+````
+
+## Example Usage:
+````
+# Create views in the cai_project project and cai_dataset dataset
+./clarity_view_importer.py -p cai_project -d cai_dataset
+
+Adding cai_project.cai_dataset.view_nodepool
+Adding cai_project.cai_dataset.view_k8s_rbac_authorization_cluster_role
+Adding cai_project.cai_dataset.view_bucket
+Adding cai_project.cai_dataset.view_k8s_deployments
+...
+````
+
+## Example View:
+GCS Bucket Inventory
+````
+SELECT 
+bucket.name as bucketPath, 
+bucket.resource.data.kind as bucketKind, 
+bucket.resource.data.name as bucketName, 
+bucket.resource.parent as bucketParent, 
+bucket.resource.data.timeCreated as bucketCreation, 
+bucket.resource.data.updated as bucketUpdated, 
+bucket.resource.data.location as bucketLocation, 
+bucket.resource.data.versioning.enabled as bucketVersioning, 
+bucket.resource.data.iamConfiguration.publicAccessPrevention as bucketPublicAccessPrevention, 
+bucket.resource.data.iamConfiguration.bucketPolicyOnly.enabled as bucketIAMPolicy, 
+bucket.resource.data.iamConfiguration.uniformBucketLevelAccess.enabled as bucketUniformAcess, 
+bucket.resource.data.locationType as bucketLocationType, 
+bucket.resource.data.directory.enabled as bucketDirectoryEnabled, 
+project.projectName as projectName
+FROM `$project.$dataset.resource_storage_googleapis_com_Bucket` bucket 
+JOIN `$project.$dataset.view_project` project ON bucket.resource.parent = project.projectParent
+WHERE DATE(bucket.readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+````
diff --git a/clarity_view_importer/clarity_view_importer.py b/clarity_view_importer/clarity_view_importer.py
@@ -0,0 +1,74 @@
+#! /usr/bin/env python3
+
+# Version 0.1.0, 2023-06-01
+# Author: rheins@nuro.ai
+# Creates BigQuery views in the specififed dataset to facilitate easier Cloud Asset Inventory querying
+
+import argparse
+import sys
+import glob
+import re
+from google.cloud import bigquery
+
+
+def get_file_contents(file):
+    f = open(file)
+    s = f.read()
+    f.close()
+    return s
+
+
+def main():
+    # Parse arguments
+    parser = argparse.ArgumentParser(
+        description='Import BigQuery view definitions for Google Cloud Asset Inventory.')
+    parser.add_argument('-p', '--project', type=str,
+                        help='GCP project name for the BigQuery Cloud Asset Inventory dataset is hosted', required=True)
+    parser.add_argument('-d', '--dataset', type=str,
+                        help='BigQuery dataset name where the BigQuery Cloud Asset Inventory is hosted', required=True)
+    parser.add_argument('-P', '--view_project', type=str,
+                        help='GCP project name where the view will be stored. Default is the value of --project.', required=False)
+    parser.add_argument('-D', '--view_dataset', type=str,
+                        help='BigQuery dataset name where the view will be stored. Default is the value of --dataset.', required=False)
+    parser.add_argument('-v', '--view_directory', type=str, default="./views",
+                        help='Directory where view definitions are stored', required=False)
+    args = parser.parse_args()
+
+    # Store the views in the Cloud Asset Inventory BigQuery project/dataset by default
+    if not args.view_project:
+        args.view_project = args.project
+
+    if not args.view_dataset:
+        args.view_dataset = args.dataset
+
+    views = glob.glob(f"{args.view_directory}/view_*.txt")
+
+    if not views:
+        sys.exit(
+            f"No view definitions found in '{args.view_directory}', exiting")
+
+    client = bigquery.Client()
+
+    for view_path in views:
+        view_name = ""
+        valid_path = re.search(r"/(view_\w+).txt", view_path)
+        if valid_path:
+            view_name = valid_path.group(1)
+
+            view_id = f"{args.view_project}.{args.view_dataset}.{view_name}"
+            print(f"Adding view: {view_id}")
+
+            view_src = get_file_contents(view_path)
+            view_src = view_src.replace("$project", args.project)
+            view_src = view_src.replace("$dataset", args.dataset)
+
+            bq_view = bigquery.Table(view_id)
+            bq_view.view_query = view_src
+
+            bq_view = client.create_table(bq_view)
+        else:
+            print(f"Invalid view name {view_path}, skipping")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/clarity_view_importer/requirements.txt b/clarity_view_importer/requirements.txt
@@ -0,0 +1 @@
+google-cloud-bigquery
diff --git a/clarity_view_importer/views/view_bqdataset.txt b/clarity_view_importer/views/view_bqdataset.txt
@@ -0,0 +1,11 @@
+SELECT 
+bqDataset.name as bqDatasetPath,
+bqDataset.resource.data.datasetReference.datasetId as bqDatasetName,
+bqDataset.resource.data.datasetReference.projectId as projectName,
+bqDataset.resource.data.description.value as bqDatasetDescription,
+(SELECT COUNT(*) FROM `$project.$dataset.view_bqtable` bqTable WHERE bqTable.bqDatasetName = bqDataset.resource.data.datasetReference.datasetId ) as bqTableCount,
+bqDataset.resource.location as bqDatasetLocation,
+bqDataset.updateTime as bqDatasetUpdateTime
+FROM `$project.$dataset.resource_bigquery_googleapis_com_Dataset` bqDataset
+WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- authors: rheins@nuro.ai
diff --git a/clarity_view_importer/views/view_bqtable.txt b/clarity_view_importer/views/view_bqtable.txt
@@ -0,0 +1,11 @@
+SELECT  
+resource.data.tableReference.tableId as bqTableName,
+name as bqTablePath,
+resource.parent as bqDatasetPath,
+resource.data.tableReference.projectId as projectName,
+resource.data.tableReference.datasetId as bqDatasetName,
+resource.location as bqTableLocation,
+updateTime as bqTableUpdateTime,
+FROM `$project.$dataset.resource_bigquery_googleapis_com_Table` 
+WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- authors: rheins@nuro.ai
diff --git a/clarity_view_importer/views/view_bucket.txt b/clarity_view_importer/views/view_bucket.txt
@@ -0,0 +1,23 @@
+SELECT 
+bucket.name as bucketPath, 
+bucket.resource.data.kind as bucketKind, 
+bucket.resource.data.name as bucketName, 
+--billing.cost_after_credits as billingLastMonthCostAfterCredits,
+--billing.cost_before_credits as billingLastMonthCostBeforeCredits,
+bucket.resource.parent as bucketParent, 
+bucket.resource.data.timeCreated as bucketCreation, 
+bucket.resource.data.updated as bucketUpdated, 
+bucket.resource.data.location as bucketLocation, 
+bucket.resource.data.versioning.enabled as bucketVersioning, 
+bucket.resource.data.iamConfiguration.publicAccessPrevention as bucketPublicAccessPrevention, 
+bucket.resource.data.iamConfiguration.bucketPolicyOnly.enabled as bucketIAMPolicy, 
+bucket.resource.data.iamConfiguration.uniformBucketLevelAccess.enabled as bucketUniformAcess, 
+bucket.resource.data.locationType as bucketLocationType, 
+bucket.resource.data.directory.enabled as bucketDirectoryEnabled, 
+project.projectName as projectName
+FROM `$project.$dataset.resource_storage_googleapis_com_Bucket` bucket 
+JOIN `$project.$dataset.view_project` project ON bucket.resource.parent = project.projectParent
+--LEFT JOIN `$project.$dataset.billing_snapshot` billing ON (billing.service = 'Cloud Storage' AND billing.resourceName = bucket.resource.data.name AND billing.month = FORMAT_DATE('%G%m', DATE_SUB(current_date(), INTERVAL 1 MONTH))) 
+
+WHERE DATE(bucket.readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- authors: rheins@nuro.ai
diff --git a/clarity_view_importer/views/view_cloudsql.txt b/clarity_view_importer/views/view_cloudsql.txt
@@ -0,0 +1,18 @@
+SELECT 
+resource.data.name as sqlName, 
+resource.data.project as projectName, 
+resource.parent as projectParent,
+resource.data.databaseVersion as sqlDBVersion, 
+resource.data.settings.tier as sqlTier, 
+resource.data.instanceType as sqlInstanceType, 
+resource.data.serviceAccountEmailAddress as serviceAccount,  
+resource.data.onPremisesConfiguration.
+hostPort as sqlHostPort,  
+resource.data.region as sqlRegion, 
+resource.data.
+createTime as sqlCreateTime,
+name as sqlPath,
+FROM `$project.$dataset.resource_sqladmin_googleapis_com_Instance` cloudsql
+JOIN `$project.$dataset.view_project` project ON project.projectParent = cloudsql.resource.parent
+WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- authors: rheins@nuro.ai
diff --git a/clarity_view_importer/views/view_cluster.txt b/clarity_view_importer/views/view_cluster.txt
@@ -0,0 +1,16 @@
+SELECT 
+resource.data.name as clusterName, 
+name as clusterPath,
+resource.location as clusterLocation, 
+resource.data.endpoint as clusterEndpoint,
+resource.data.workloadIdentityConfig as clusterWorkload,
+resource.data.nodeConfig.workloadMetadataConfig.mode as clusterWorkloadMode,
+resource.data.clusterIpv4Cidr as clusterCIDR,
+resource.data.masterAuthorizedNetworksConfig.cidrBlocks as authorizedNetworks,
+SPLIT(name, '/')[OFFSET(4)] as projectName, 
+resource.data.currentMasterVersion as clusterMasterVersion,
+resource.data.currentNodeVersion as clusterNodeVersion,
+resource.data.workloadIdentityConfig.workloadPool as clusterWorkloadIdentity,
+FROM `$project.$dataset.resource_container_googleapis_com_Cluster`
+WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- authors: kbroughton@nuro.ai
diff --git a/clarity_view_importer/views/view_container.txt b/clarity_view_importer/views/view_container.txt
@@ -0,0 +1,54 @@
+-- We need to UNION ALL the contrainerregistry (deprecated) and artifactregistry tables to get all_containers.
+-- containerregistry is deprecated in favor of artifactregistry
+WITH all_containers as (
+-- BEGIN CONTAINER REGISTRY
+SELECT 
+  SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)] as hostname,
+  SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)] as project,
+  SPLIT(SPLIT(resource.data.name, '@')[SAFE_OFFSET(0)], '/')[SAFE_OFFSET(2)] as repository,
+  ARRAY_REVERSE(SPLIT(SPLIT(REPLACE(resource.data.name, CONCAT(SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(2)], '/'), ''), '@')[SAFE_OFFSET(0)], '/'))[SAFE_OFFSET(0)] as container_name,
+-- full_resource_name not supported for containerregistry
+  NULL as full_resource_name,
+  resource.data.name as uri,
+ARRAY_LENGTH(SPLIT(resource.data.name, '/')) as num_parts,
+  (
+    SELECT STRING_AGG(part, '/' ORDER BY index) 
+    FROM UNNEST(SPLIT(SPLIT(resource.data.name, '@')[OFFSET(0)], '/')) part WITH OFFSET index 
+    WHERE index BETWEEN 2 AND ARRAY_LENGTH(SPLIT(resource.data.name, '/'))
+  ) container_path,
+  SPLIT(name, '@')[SAFE_OFFSET(1)] as sha,
+  resource.data.timeUploaded as upload_time,
+  resource.data.tags as tags,
+FROM `$project.$dataset.resource_containerregistry_googleapis_com_Image` 
+WHERE 
+DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+-- END CONTAINER REGISTRY
+
+-- We destinguish between containerregistry and artifactregistry based on full_resource_path (=NULL for containerregistry)
+
+-- BEGIN ARTIFACT REGISTRY
+UNION ALL
+SELECT 
+  SPLIT(resource.data.uri, '/')[SAFE_OFFSET(0)] as hostname,
+  SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)] as project,
+  SPLIT(SPLIT(resource.data.name, '@')[SAFE_OFFSET(0)], '/')[SAFE_OFFSET(5)] as repository,
+  ARRAY_REVERSE(SPLIT(SPLIT(REPLACE(resource.data.name, CONCAT(SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(2)], '/'), ''), '@')[SAFE_OFFSET(0)], '/'))[SAFE_OFFSET(0)] as container_name,
+  CONCAT("//artifactregistry.googleapis.com/", resource.data.uri) as full_resource_name,
+  resource.data.name as uri,
+ARRAY_LENGTH(SPLIT(resource.data.name, '/')) as num_parts,
+  (
+    SELECT STRING_AGG(part, '/' ORDER BY index) 
+    FROM UNNEST(SPLIT(SPLIT(resource.data.name, '@')[OFFSET(0)], '/')) part WITH OFFSET index 
+    WHERE index BETWEEN 2 AND ARRAY_LENGTH(SPLIT(resource.data.name, '/'))
+  ) container_path,
+  SPLIT(name, '@')[SAFE_OFFSET(1)] as sha,
+  resource.data.uploadTime as upload_time,
+  resource.data.tags as tags,
+FROM `$dataset.resource_artifactregistry_googleapis_com_DockerImage` artifact_containers
+WHERE
+DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+)
+-- SELECT DISTINCT(hostname) FROM all_containers
+SELECT * FROM all_containers
+
+-- authors: kbroughton@nuro.ai
diff --git a/clarity_view_importer/views/view_container_latest.txt b/clarity_view_importer/views/view_container_latest.txt
@@ -0,0 +1,13 @@
+-- Select just the most recent upload_time, i.e., latest container
+with latest_upload_times as (
+  select 
+  view_container.container_path,
+  max(view_container.upload_time) as latest_upload_time
+  from `$dataset.view_container` view_container 
+  GROUP BY view_container.container_path
+)
+SELECT * EXCEPT(container_path) from latest_upload_times
+JOIN $dataset.view_container as view_container
+  ON latest_upload_times.latest_upload_time = view_container.upload_time and latest_upload_times.container_path = view_container.container_path
+ORDER BY latest_upload_time ASC
+-- authors: kbroughton@nuro.ai