diff --git a/docs/2.develop/lake/structures/chunk.mdx b/docs/2.develop/lake/structures/chunk.mdx index c569d4fbb0f..35a2cc712ef 100644 --- a/docs/2.develop/lake/structures/chunk.mdx +++ b/docs/2.develop/lake/structures/chunk.mdx @@ -10,11 +10,11 @@ import TabItem from '@theme/TabItem'; ## Definition -`Chunk` of a [`Block`](./block.mdx) is a part of a [`Block`](./block.mdx) from a [Shard](./shard.mdx). The collection of Chunks of the Block forms the NEAR Protocol [`Block`](./block.mdx) +`Chunk` of a [`Block`](block.mdx) is a part of a [`Block`](block.mdx) from a [Shard](shard.mdx). The collection of Chunks of the Block forms the NEAR Protocol [`Block`](block.mdx) Chunk contains all the structures that make the Block: -- [Transactions](./transaction.mdx) -- [Receipts](./receipt.mdx) +- [Transactions](transaction.mdx) +- [Receipts](receipt.mdx) - [ChunkHeader](#chunkheaderview) ## `IndexerChunkView` diff --git a/docs/2.develop/lake/structures/execution_outcome.mdx b/docs/2.develop/lake/structures/execution_outcome.mdx index 8af9b9b1a6f..c43a2878c9b 100644 --- a/docs/2.develop/lake/structures/execution_outcome.mdx +++ b/docs/2.develop/lake/structures/execution_outcome.mdx @@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem'; ## Definition -ExecutionOutcome is the result of execution of [Transaction](./transaction.mdx) or [Receipt](./receipt.mdx) +ExecutionOutcome is the result of execution of [Transaction](transaction.mdx) or [Receipt](receipt.mdx) :::info Transaction's ExecutionOutcome diff --git a/docs/2.develop/lake/structures/shard.mdx b/docs/2.develop/lake/structures/shard.mdx index 4b6223854b1..fe169faed5e 100644 --- a/docs/2.develop/lake/structures/shard.mdx +++ b/docs/2.develop/lake/structures/shard.mdx @@ -13,9 +13,9 @@ import TabItem from '@theme/TabItem'; `IndexerShard` struct is ephemeral structure, there is no such entity in `nearcore`. We've introduces it as a container in [`near-indexer-primitives`](https://crates.io/crates/near-indexer-primitives). This container includes: - shard ID -- [Chunk](./chunk.mdx) that might be absent -- [ExecutionOutcomes](./execution_outcome.mdx) for [Receipts](./receipt.mdx) (these belong to a Shard not to a [Chunk](./chunk.mdx) or a [Block](./block.mdx)) -- [StateChanges](./state_change.mdx) for a Shard +- [Chunk](chunk.mdx) that might be absent +- [ExecutionOutcomes](execution_outcome.mdx) for [Receipts](receipt.mdx) (these belong to a Shard not to a [Chunk](chunk.mdx) or a [Block](block.mdx)) +- [StateChanges](state_change.mdx) for a Shard ## `IndexerShard` diff --git a/docs/2.develop/lake/structures/transaction.mdx b/docs/2.develop/lake/structures/transaction.mdx index 5c0b8b72261..093fec57137 100644 --- a/docs/2.develop/lake/structures/transaction.mdx +++ b/docs/2.develop/lake/structures/transaction.mdx @@ -55,7 +55,7 @@ export type Transaction = { ## `ActionView` -`ActionView` is an Enum with possible actions along with parameters. This structure is used in Transactions and in [Receipts](./receipt.mdx) +`ActionView` is an Enum with possible actions along with parameters. This structure is used in Transactions and in [Receipts](receipt.mdx) diff --git a/docs/bos/queryapi/big-query.md b/docs/bos/queryapi/big-query.md new file mode 100644 index 00000000000..8d4b3deb8ba --- /dev/null +++ b/docs/bos/queryapi/big-query.md @@ -0,0 +1,104 @@ +--- +id: big-query +title: BigQuery Public Dataset +sidebar_label: BigQuery +--- + +Blockchain data indexing in NEAR Public Lakehouse is for anyone wanting to understand blockchain data. This includes: + +- **Users**: create queries to track NEAR assets, monitor transactions, or analyze on-chain events at a massive scale. +- **Researchers**: use indexed data for data science tasks, including on-chain activities, identifying trends, or feeding AI/ML pipelines for predictive analysis. +- **Startups**: can use NEAR's indexed data for deep insights on user engagement, smart contract utilization, or insights across tokens and NFT adoption. + +Benefits: + +- **NEAR instant insights**: Historical on-chain data queried at scale. +- **Cost-effective**: eliminate the need to store and process bulk NEAR protocol data; query as little or as much data as preferred. +- **Easy to use**: no prior experience with blockchain technology is required; bring a general knowledge of SQL to unlock insights. + + +## Getting started + +1. Login into your [Google Cloud Account](https://console.cloud.google.com/). +2. Open the [NEAR Protocol BigQuery Public Dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/crypto-near-mainnet). +3. Click in the [VIEW DATASET](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=crypto_near_mainnet_us&page=dataset) button. +4. Click in the + to create a new tab and write your query, click in the RUN button, and check the `Query results` below the query. +5. Done :) + +:::info + +The [NEAR Public Lakehouse repository](https://github.com/near/near-public-lakehouse) contains the source code for ingesting NEAR Protocol data stored as JSON files in AWS S3 by [NEAR Lake Indexer](https://github.com/near/near-lake-indexer). + +::: + +### Example Queries + +- _How many unique users do I have for my smart contract per day?_ + +```sql +SELECT + r.block_date collected_for_day, + COUNT(DISTINCT r.transaction_signer_account_id) +FROM `bigquery-public-data.crypto_near_mainnet_us.receipt_actions` ra + INNER JOIN `bigquery-public-data.crypto_near_mainnet_us.receipts` r ON r.receipt_id = ra.receipt_id +WHERE ra.action_kind = 'FUNCTION_CALL' + AND ra.receipt_receiver_account_id = 'near.social' -- change to your contract +GROUP BY 1 +ORDER BY 1 DESC; +``` + +## How much it costs? + +- NEAR pays for the storage and doesn't charge you to use the public dataset. + > To learn more about BigQuery public datasets [check this page](https://cloud.google.com/bigquery/public-data). +- Google GCP charges for the queries that you perform on the data. For example, in today's price "Sep 1st, 2023" the On-demand (per TB) query pricing is $6.25 per TB where the first 1 TB per month is free. + > Check [Google's pricing page](https://cloud.google.com/bigquery/pricing#analysis_pricing_models) for detailed pricing info, options, and best practices. + +:::tip +You can check how much data it will query before running it in the BigQuery console UI. Again, since BigQuery uses a columnar data structure and partitions, it's recommended to select only the columns and partitions (`block_date`) needed to avoid unnecessary query costs. +::: + +![Query Costs](/docs/BQ_Query_Cost.png "BQ Query Costs") + +## Architecture + +The data is loaded in a streaming fashion using [Databricks Autoloader](https://docs.gcp.databricks.com/ingestion/auto-loader/index.html) into raw/bronze tables, and transformed with [Databricks Delta Live Tables](https://www.databricks.com/product/delta-live-tables) streaming jobs into cleaned/enriched/silver tables. + +The silver tables are also copied into the [GCP BigQuery Public Dataset](https://cloud.google.com/bigquery/public-data). + +![Architecture](/docs/Architecture.png "Architecture") + +:::info + +[Databricks Medallion Architecture](https://www.databricks.com/glossary/medallion-architecture). + +::: + +## Available Data + +The current data that NEAR is providing was inspired by [NEAR Indexer for Explorer](https://github.com/near/near-indexer-for-explorer/). + +:::info +NEAR plans to improve the data available in the NEAR Public Lakehouse making it easier to consume by denormalizing some tables. +::: + +The tables available in the NEAR Public Lakehouse are: + +- **blocks**: A structure that represents an entire block in the NEAR blockchain. `Block` is the main entity in NEAR Protocol blockchain. Blocks are produced in NEAR Protocol every second. +- **chunks**: A structure that represents a chunk in the NEAR blockchain. `Chunk` of a `Block` is a part of a `Block` from a `Shard`. The collection of `Chunks` of the `Block` forms the NEAR Protocol Block. `Chunk` contains all the structures that make the `Block`: `Transactions`, [`Receipts`](https://nomicon.io/RuntimeSpec/Receipts), and `Chunk Header`. +- **transactions**: [`Transaction`](../../2.develop/lake/structures/transaction.mdx#definition) is the main way of interaction between a user and a blockchain. Transaction contains: Signer account ID, Receiver account ID, and Actions. +- **execution_outcomes**: Execution outcome is the result of execution of `Transaction` or `Receipt`. In the result of the Transaction execution will always be a Receipt. +- **receipt_details**: All cross-contract (we assume that each account lives in its own shard) communication in Near happens through Receipts. Receipts are stateful in a sense that they serve not only as messages between accounts but also can be stored in the account storage to await `DataReceipts`. Each receipt has a `predecessor_id` (who sent it) and `receiver_id` the current account. +- **receipt_origin**: Tracks the transaction that originated the receipt. +- **receipt_actions**: Action Receipt represents a request to apply actions on the `receiver_id` side. It could be derived as a result of a `Transaction` execution or another `ACTION` Receipt processing. Action kind can be: `ADD_KEY`, `CREATE_ACCOUNT`, `DELEGATE_ACTION`, `DELETE_ACCOUNT`, `DELETE_KEY`, `DEPLOY_CONTRACT`, `FUNCTION_CALL`, `STAKE`, `TRANSFER`. +- **receipts (view)**: It's recommended to select only the columns and partitions (`block_date`) needed to avoid unnecessary query costs. This view join the receipt details, the transaction that originated the receipt and the receipt execution outcome. +- **account_changes**: Each account has an associated state where it stores its metadata and all the contract-related data (contract's code + storage). + +:::note References + +- [Protocol documentation](../../1.concepts/welcome.md) +- [Near Data flow](../../1.concepts/data-flow/near-data-flow.md) +- [Lake Data structures](../../2.develop/lake/structures/toc.mdx) +- [Protocol specification](https://nomicon.io/) + +::: diff --git a/website/sidebars.json b/website/sidebars.json index b1fa0c100da..7f79f9eb6de 100644 --- a/website/sidebars.json +++ b/website/sidebars.json @@ -197,6 +197,7 @@ }, { "Data Analytics": [ + "bos/queryapi/big-query", "tools/indexer-for-explorer" ] }, diff --git a/website/static/docs/Architecture.png b/website/static/docs/Architecture.png new file mode 100644 index 00000000000..00c9493209c Binary files /dev/null and b/website/static/docs/Architecture.png differ diff --git a/website/static/docs/BQ_Query_Cost.png b/website/static/docs/BQ_Query_Cost.png new file mode 100644 index 00000000000..42cc5325c70 Binary files /dev/null and b/website/static/docs/BQ_Query_Cost.png differ