Skip to content

Commit

Permalink
Version 0.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
CraftVerk committed Jun 7, 2023
0 parents commit 5368868
Show file tree
Hide file tree
Showing 43 changed files with 1,262 additions and 0 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

# Change Log
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [0.1.0] - 2023-06-07
### Added
- Initial release of clarity_view_importer.py and BigQuery view definitions
13 changes: 13 additions & 0 deletions LICENSE.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Copyright 2023 Nuro

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# CLARITY

The **CL**oud **A**sset **R**epositry and **I**nventory **T**ool for **Y**ou (CLARITY) is a tool to extend Google Cloud Platform's [Cloud Asset Inventory](https://cloud.google.com/asset-inventory/docs/overview) (CAI) for better usability and functionality. It consists of two components:

- **BigQuery Data Views** - a set of helpful views that make it easier to retrieve useful information from the 250+ tables created by the CAI data export process.
- **CLARITY GUI** - a web interface for querying and pivoting among common data types.

## Getting Started

### Enable GCP Cloud Asset Inventory BigQuery Export
Follow [this guide](https://cloud.google.com/asset-inventory/docs/exporting-to-bigquery) to export asset metadata for your GCP environment to BigQuery table. You may want to start with a single snapshot to get started and then move to a regularly scheduled snapshot using a Google Cloud Function, such as what is described in this [Medium.com]( https://medium.com/google-cloud/using-gcp-cloud-asset-inventory-export-to-keep-track-of-your-gcp-resources-over-time-20fb6fa63c68) guide.

### Import the CLARITY Views
These 37 views make it easier to access common data types from your CAI BigQuery tables and are needed for the CLARITY web interface to function.

Run the latest version of `clarity_view_importer.py` from this repository, specifying the GCP project and BigQuery dataset for your existing CAI data. By default, the views will be created within your CAI dataset but this can be overridden with the `-P` and `-D` options at runtime.

````
# Create views in the cai_project project and cai_dataset dataset
./clarity_view_importer.py -p cai_project -d cai_dataset
Adding cai_project.cai_dataset.view_nodepool
Adding cai_project.cai_dataset.view_k8s_rbac_authorization_cluster_role
Adding cai_project.cai_dataset.view_bucket
Adding cai_project.cai_dataset.view_k8s_deployments
...
````

### Install the CLARITY Web Interface
Once the CLARITY Web Interface is available, instructions will be included here.

## License

The software is provided by [Nuro](https://nuro.ai) under the Apache Software License agreement.
63 changes: 63 additions & 0 deletions clarity_view_importer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# CAI View Importer
Script to import BigQuery custom view definitions to facilitate streamlined querying of Google Cloud Asset Inventory data. These views must be created for the CLARITY web interface to function.

## Requirements:
- Google Cloud Asset Inventory data [exported to BigQuery](https://cloud.google.com/asset-inventory/docs/exporting-to-bigquery)
- Python BigQuery module
- GCP account with `roles/bigquery.jobUser` and `roles/bigquery.dataEditor` predefined IAM roles to create the views.

## Usage:

````
usage: clarity_view_importer.py [-h] -p PROJECT -d DATASET [-P VIEW_PROJECT] [-D VIEW_DATASET] [-v VIEW_DIRECTORY]
Import BigQuery view definitions for Google Cloud Asset Inventory.
optional arguments:
-h, --help show this help message and exit
-p PROJECT, --project PROJECT
GCP project name for the BigQuery Cloud Asset Inventory dataset is hosted
-d DATASET, --dataset DATASET
BigQuery dataset name where the BigQuery Cloud Asset Inventory is hosted
-P VIEW_PROJECT, --view_project VIEW_PROJECT
GCP project name where the view will be stored. Default is --project value.
-D VIEW_DATASET, --view_dataset VIEW_DATASET
BigQuery dataset name where the view will be stored. Default is --dataset value.
-v VIEW_DIRECTORY, --view_directory VIEW_DIRECTORY
Directory where view definitions are stored
````

## Example Usage:
````
# Create views in the cai_project project and cai_dataset dataset
./clarity_view_importer.py -p cai_project -d cai_dataset
Adding cai_project.cai_dataset.view_nodepool
Adding cai_project.cai_dataset.view_k8s_rbac_authorization_cluster_role
Adding cai_project.cai_dataset.view_bucket
Adding cai_project.cai_dataset.view_k8s_deployments
...
````

## Example View:
GCS Bucket Inventory
````
SELECT
bucket.name as bucketPath,
bucket.resource.data.kind as bucketKind,
bucket.resource.data.name as bucketName,
bucket.resource.parent as bucketParent,
bucket.resource.data.timeCreated as bucketCreation,
bucket.resource.data.updated as bucketUpdated,
bucket.resource.data.location as bucketLocation,
bucket.resource.data.versioning.enabled as bucketVersioning,
bucket.resource.data.iamConfiguration.publicAccessPrevention as bucketPublicAccessPrevention,
bucket.resource.data.iamConfiguration.bucketPolicyOnly.enabled as bucketIAMPolicy,
bucket.resource.data.iamConfiguration.uniformBucketLevelAccess.enabled as bucketUniformAcess,
bucket.resource.data.locationType as bucketLocationType,
bucket.resource.data.directory.enabled as bucketDirectoryEnabled,
project.projectName as projectName
FROM `$project.$dataset.resource_storage_googleapis_com_Bucket` bucket
JOIN `$project.$dataset.view_project` project ON bucket.resource.parent = project.projectParent
WHERE DATE(bucket.readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
````
74 changes: 74 additions & 0 deletions clarity_view_importer/clarity_view_importer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#! /usr/bin/env python3

# Version 0.1.0, 2023-06-01
# Author: rheins@nuro.ai
# Creates BigQuery views in the specififed dataset to facilitate easier Cloud Asset Inventory querying

import argparse
import sys
import glob
import re
from google.cloud import bigquery


def get_file_contents(file):
f = open(file)
s = f.read()
f.close()
return s


def main():
# Parse arguments
parser = argparse.ArgumentParser(
description='Import BigQuery view definitions for Google Cloud Asset Inventory.')
parser.add_argument('-p', '--project', type=str,
help='GCP project name for the BigQuery Cloud Asset Inventory dataset is hosted', required=True)
parser.add_argument('-d', '--dataset', type=str,
help='BigQuery dataset name where the BigQuery Cloud Asset Inventory is hosted', required=True)
parser.add_argument('-P', '--view_project', type=str,
help='GCP project name where the view will be stored. Default is the value of --project.', required=False)
parser.add_argument('-D', '--view_dataset', type=str,
help='BigQuery dataset name where the view will be stored. Default is the value of --dataset.', required=False)
parser.add_argument('-v', '--view_directory', type=str, default="./views",
help='Directory where view definitions are stored', required=False)
args = parser.parse_args()

# Store the views in the Cloud Asset Inventory BigQuery project/dataset by default
if not args.view_project:
args.view_project = args.project

if not args.view_dataset:
args.view_dataset = args.dataset

views = glob.glob(f"{args.view_directory}/view_*.txt")

if not views:
sys.exit(
f"No view definitions found in '{args.view_directory}', exiting")

client = bigquery.Client()

for view_path in views:
view_name = ""
valid_path = re.search(r"/(view_\w+).txt", view_path)
if valid_path:
view_name = valid_path.group(1)

view_id = f"{args.view_project}.{args.view_dataset}.{view_name}"
print(f"Adding view: {view_id}")

view_src = get_file_contents(view_path)
view_src = view_src.replace("$project", args.project)
view_src = view_src.replace("$dataset", args.dataset)

bq_view = bigquery.Table(view_id)
bq_view.view_query = view_src

bq_view = client.create_table(bq_view)
else:
print(f"Invalid view name {view_path}, skipping")


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions clarity_view_importer/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
google-cloud-bigquery
11 changes: 11 additions & 0 deletions clarity_view_importer/views/view_bqdataset.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
SELECT
bqDataset.name as bqDatasetPath,
bqDataset.resource.data.datasetReference.datasetId as bqDatasetName,
bqDataset.resource.data.datasetReference.projectId as projectName,
bqDataset.resource.data.description.value as bqDatasetDescription,
(SELECT COUNT(*) FROM `$project.$dataset.view_bqtable` bqTable WHERE bqTable.bqDatasetName = bqDataset.resource.data.datasetReference.datasetId ) as bqTableCount,
bqDataset.resource.location as bqDatasetLocation,
bqDataset.updateTime as bqDatasetUpdateTime
FROM `$project.$dataset.resource_bigquery_googleapis_com_Dataset` bqDataset
WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- authors: rheins@nuro.ai
11 changes: 11 additions & 0 deletions clarity_view_importer/views/view_bqtable.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
SELECT
resource.data.tableReference.tableId as bqTableName,
name as bqTablePath,
resource.parent as bqDatasetPath,
resource.data.tableReference.projectId as projectName,
resource.data.tableReference.datasetId as bqDatasetName,
resource.location as bqTableLocation,
updateTime as bqTableUpdateTime,
FROM `$project.$dataset.resource_bigquery_googleapis_com_Table`
WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- authors: rheins@nuro.ai
23 changes: 23 additions & 0 deletions clarity_view_importer/views/view_bucket.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
SELECT
bucket.name as bucketPath,
bucket.resource.data.kind as bucketKind,
bucket.resource.data.name as bucketName,
--billing.cost_after_credits as billingLastMonthCostAfterCredits,
--billing.cost_before_credits as billingLastMonthCostBeforeCredits,
bucket.resource.parent as bucketParent,
bucket.resource.data.timeCreated as bucketCreation,
bucket.resource.data.updated as bucketUpdated,
bucket.resource.data.location as bucketLocation,
bucket.resource.data.versioning.enabled as bucketVersioning,
bucket.resource.data.iamConfiguration.publicAccessPrevention as bucketPublicAccessPrevention,
bucket.resource.data.iamConfiguration.bucketPolicyOnly.enabled as bucketIAMPolicy,
bucket.resource.data.iamConfiguration.uniformBucketLevelAccess.enabled as bucketUniformAcess,
bucket.resource.data.locationType as bucketLocationType,
bucket.resource.data.directory.enabled as bucketDirectoryEnabled,
project.projectName as projectName
FROM `$project.$dataset.resource_storage_googleapis_com_Bucket` bucket
JOIN `$project.$dataset.view_project` project ON bucket.resource.parent = project.projectParent
--LEFT JOIN `$project.$dataset.billing_snapshot` billing ON (billing.service = 'Cloud Storage' AND billing.resourceName = bucket.resource.data.name AND billing.month = FORMAT_DATE('%G%m', DATE_SUB(current_date(), INTERVAL 1 MONTH)))

WHERE DATE(bucket.readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- authors: rheins@nuro.ai
18 changes: 18 additions & 0 deletions clarity_view_importer/views/view_cloudsql.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
SELECT
resource.data.name as sqlName,
resource.data.project as projectName,
resource.parent as projectParent,
resource.data.databaseVersion as sqlDBVersion,
resource.data.settings.tier as sqlTier,
resource.data.instanceType as sqlInstanceType,
resource.data.serviceAccountEmailAddress as serviceAccount,
resource.data.onPremisesConfiguration.
hostPort as sqlHostPort,
resource.data.region as sqlRegion,
resource.data.
createTime as sqlCreateTime,
name as sqlPath,
FROM `$project.$dataset.resource_sqladmin_googleapis_com_Instance` cloudsql
JOIN `$project.$dataset.view_project` project ON project.projectParent = cloudsql.resource.parent
WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- authors: rheins@nuro.ai
16 changes: 16 additions & 0 deletions clarity_view_importer/views/view_cluster.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
SELECT
resource.data.name as clusterName,
name as clusterPath,
resource.location as clusterLocation,
resource.data.endpoint as clusterEndpoint,
resource.data.workloadIdentityConfig as clusterWorkload,
resource.data.nodeConfig.workloadMetadataConfig.mode as clusterWorkloadMode,
resource.data.clusterIpv4Cidr as clusterCIDR,
resource.data.masterAuthorizedNetworksConfig.cidrBlocks as authorizedNetworks,
SPLIT(name, '/')[OFFSET(4)] as projectName,
resource.data.currentMasterVersion as clusterMasterVersion,
resource.data.currentNodeVersion as clusterNodeVersion,
resource.data.workloadIdentityConfig.workloadPool as clusterWorkloadIdentity,
FROM `$project.$dataset.resource_container_googleapis_com_Cluster`
WHERE DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- authors: kbroughton@nuro.ai
54 changes: 54 additions & 0 deletions clarity_view_importer/views/view_container.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
-- We need to UNION ALL the contrainerregistry (deprecated) and artifactregistry tables to get all_containers.
-- containerregistry is deprecated in favor of artifactregistry
WITH all_containers as (
-- BEGIN CONTAINER REGISTRY
SELECT
SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)] as hostname,
SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)] as project,
SPLIT(SPLIT(resource.data.name, '@')[SAFE_OFFSET(0)], '/')[SAFE_OFFSET(2)] as repository,
ARRAY_REVERSE(SPLIT(SPLIT(REPLACE(resource.data.name, CONCAT(SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(2)], '/'), ''), '@')[SAFE_OFFSET(0)], '/'))[SAFE_OFFSET(0)] as container_name,
-- full_resource_name not supported for containerregistry
NULL as full_resource_name,
resource.data.name as uri,
ARRAY_LENGTH(SPLIT(resource.data.name, '/')) as num_parts,
(
SELECT STRING_AGG(part, '/' ORDER BY index)
FROM UNNEST(SPLIT(SPLIT(resource.data.name, '@')[OFFSET(0)], '/')) part WITH OFFSET index
WHERE index BETWEEN 2 AND ARRAY_LENGTH(SPLIT(resource.data.name, '/'))
) container_path,
SPLIT(name, '@')[SAFE_OFFSET(1)] as sha,
resource.data.timeUploaded as upload_time,
resource.data.tags as tags,
FROM `$project.$dataset.resource_containerregistry_googleapis_com_Image`
WHERE
DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- END CONTAINER REGISTRY

-- We destinguish between containerregistry and artifactregistry based on full_resource_path (=NULL for containerregistry)

-- BEGIN ARTIFACT REGISTRY
UNION ALL
SELECT
SPLIT(resource.data.uri, '/')[SAFE_OFFSET(0)] as hostname,
SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)] as project,
SPLIT(SPLIT(resource.data.name, '@')[SAFE_OFFSET(0)], '/')[SAFE_OFFSET(5)] as repository,
ARRAY_REVERSE(SPLIT(SPLIT(REPLACE(resource.data.name, CONCAT(SPLIT(resource.data.name, '/')[SAFE_OFFSET(0)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(1)], '/', SPLIT(resource.data.name, '/')[SAFE_OFFSET(2)], '/'), ''), '@')[SAFE_OFFSET(0)], '/'))[SAFE_OFFSET(0)] as container_name,
CONCAT("//artifactregistry.googleapis.com/", resource.data.uri) as full_resource_name,
resource.data.name as uri,
ARRAY_LENGTH(SPLIT(resource.data.name, '/')) as num_parts,
(
SELECT STRING_AGG(part, '/' ORDER BY index)
FROM UNNEST(SPLIT(SPLIT(resource.data.name, '@')[OFFSET(0)], '/')) part WITH OFFSET index
WHERE index BETWEEN 2 AND ARRAY_LENGTH(SPLIT(resource.data.name, '/'))
) container_path,
SPLIT(name, '@')[SAFE_OFFSET(1)] as sha,
resource.data.uploadTime as upload_time,
resource.data.tags as tags,
FROM `$dataset.resource_artifactregistry_googleapis_com_DockerImage` artifact_containers
WHERE
DATE(readTime) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
)
-- SELECT DISTINCT(hostname) FROM all_containers
SELECT * FROM all_containers

-- authors: kbroughton@nuro.ai
13 changes: 13 additions & 0 deletions clarity_view_importer/views/view_container_latest.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Select just the most recent upload_time, i.e., latest container
with latest_upload_times as (
select
view_container.container_path,
max(view_container.upload_time) as latest_upload_time
from `$dataset.view_container` view_container
GROUP BY view_container.container_path
)
SELECT * EXCEPT(container_path) from latest_upload_times
JOIN $dataset.view_container as view_container
ON latest_upload_times.latest_upload_time = view_container.upload_time and latest_upload_times.container_path = view_container.container_path
ORDER BY latest_upload_time ASC
-- authors: kbroughton@nuro.ai
Loading

0 comments on commit 5368868

Please sign in to comment.