Skip to content

ETL (extract, transform and load) tools for ingesting Band Protocol blockchain data to Google BigQuery and Pub/Sub

License

Notifications You must be signed in to change notification settings

blockchain-etl/band-etl

Repository files navigation

Band ETL

Build Status Telegram

Overview

Band ETL allows you to setup an ETL pipeline in Google Cloud Platform for ingesting Band blockchain data into BigQuery and Pub/Sub. It comes with CLI tools for exporting Band data into JSON newline-delimited files partitioned by day.

Data is available for you to query right away in Google BigQuery.

Architecture

band_etl_architecture.svg

Google Slides version

  1. The nodes are run in a Kubernetes cluster. Refer to Band Node in Kubernetes for deployment instructions.

  2. Airflow DAGs export and load Band data to BigQuery daily. Refer to Band ETL Airflow for deployment instructions.

  3. Band data is polled periodically from the nodes and pushed to Google Pub/Sub. Refer to Band ETL Streaming for deployment instructions.

  4. Band data is pulled from Pub/Sub, transformed and streamed to BigQuery. Refer to Band ETL Dataflow for deployment instructions.

  5. Band data is pulled from Pub/Sub and BigQuery for metrics calculation and anomaly detection. Refer to Band Dataflow Sample Applications for deployment instructions.

Setting Up

  1. Follow the instructions in Band Node in Kubernetes to deploy an Band node in GKE. Wait until it's fully synced. Make note of the Load Balancer IP from the node deployment, it will be used in Airflow and Streamer components below.

  2. Follow the instructions in Band ETL Airflow to deploy a Cloud Composer cluster for exporting and loading historical Band data. It may take several hours for the export DAG to catch up. During this time "load" and "verify_streaming" DAGs will fail.

  3. Follow the instructions in Band ETL Streaming to deploy the Streamer component. For the value in last_synced_block.txt specify the last block number of the previous day. You can query it in BigQuery: SELECT height FROM mainnet.blocks ORDER BY height DESC LIMIT 1.

  4. Follow the instructions in Band ETL Dataflow to deploy the Dataflow component. Monitor "verify_streaming" DAG in Airflow console, once the Dataflow job catches up the latest block, the DAG will succeed.

  5. Follow the instructions in Band Dataflow Sample Applications to deploy the Dataflow Sample Applications.

About

ETL (extract, transform and load) tools for ingesting Band Protocol blockchain data to Google BigQuery and Pub/Sub

Resources

License

Stars

Watchers

Forks

Packages

No packages published