Skip to content

Set up a scheduled Cloud Run job to update HTAN metadata tables in BigQuery weekly.

Notifications You must be signed in to change notification settings

clarisse-lau/bq-metadata-cloud-run

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bq-metadata-cloud-run

Cloud Run job to pull metadata manifests from Synapse and update tables in the Google BigQuery dataset htan-dcc.combined_assays. This dataset contains clinical, biospecimen, and assay metadata tables combined across HTAN centers.

Scheduled to run daily at 0200 ET.

Requirements

Requires access to deploy resources in the HTAN Google Cloud Project, htan-dcc. Please contact an owner of htan-dcc to request access (Owners in 2024: Clarisse Lau, Vesteinn Thorsson, William Longabaugh, ISB)

Prerequisites

  • Create a Synapse Auth Token secret in Secret Manager. Requires download access to all individual HTAN-center Synapse projects. Currently uses synapse-service-HTAN-lambda service account.

  • Install Terraform >= 1.7.0

Docker Image

Before creating job, build and push a docker image to Google Artifact Registry (recommended)

cd src
docker build . -t us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
docker push us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>

Deploy Cloud Resources

Define variables in terraform.tfvars. Variable descriptions can be found in variables.tf

terraform init
terraform plan
terraform apply

About

Set up a scheduled Cloud Run job to update HTAN metadata tables in BigQuery weekly.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 62.8%
  • HCL 33.2%
  • Dockerfile 4.0%