Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
gagdiez authored Sep 5, 2023
2 parents d94086d + cfed94d commit 1ce6541
Show file tree
Hide file tree
Showing 8 changed files with 113 additions and 8 deletions.
6 changes: 3 additions & 3 deletions docs/2.develop/lake/structures/chunk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ import TabItem from '@theme/TabItem';

## Definition

`Chunk` of a [`Block`](./block.mdx) is a part of a [`Block`](./block.mdx) from a [Shard](./shard.mdx). The collection of Chunks of the Block forms the NEAR Protocol [`Block`](./block.mdx)
`Chunk` of a [`Block`](block.mdx) is a part of a [`Block`](block.mdx) from a [Shard](shard.mdx). The collection of Chunks of the Block forms the NEAR Protocol [`Block`](block.mdx)

Chunk contains all the structures that make the Block:
- [Transactions](./transaction.mdx)
- [Receipts](./receipt.mdx)
- [Transactions](transaction.mdx)
- [Receipts](receipt.mdx)
- [ChunkHeader](#chunkheaderview)

## `IndexerChunkView`
Expand Down
2 changes: 1 addition & 1 deletion docs/2.develop/lake/structures/execution_outcome.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';

## Definition

ExecutionOutcome is the result of execution of [Transaction](./transaction.mdx) or [Receipt](./receipt.mdx)
ExecutionOutcome is the result of execution of [Transaction](transaction.mdx) or [Receipt](receipt.mdx)

:::info Transaction's ExecutionOutcome

Expand Down
6 changes: 3 additions & 3 deletions docs/2.develop/lake/structures/shard.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ import TabItem from '@theme/TabItem';
`IndexerShard` struct is ephemeral structure, there is no such entity in `nearcore`. We've introduces it as a container in [`near-indexer-primitives`](https://crates.io/crates/near-indexer-primitives). This container includes:

- shard ID
- [Chunk](./chunk.mdx) that might be absent
- [ExecutionOutcomes](./execution_outcome.mdx) for [Receipts](./receipt.mdx) (these belong to a Shard not to a [Chunk](./chunk.mdx) or a [Block](./block.mdx))
- [StateChanges](./state_change.mdx) for a Shard
- [Chunk](chunk.mdx) that might be absent
- [ExecutionOutcomes](execution_outcome.mdx) for [Receipts](receipt.mdx) (these belong to a Shard not to a [Chunk](chunk.mdx) or a [Block](block.mdx))
- [StateChanges](state_change.mdx) for a Shard

## `IndexerShard`

Expand Down
2 changes: 1 addition & 1 deletion docs/2.develop/lake/structures/transaction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ export type Transaction = {

## `ActionView`

`ActionView` is an Enum with possible actions along with parameters. This structure is used in Transactions and in [Receipts](./receipt.mdx)
`ActionView` is an Enum with possible actions along with parameters. This structure is used in Transactions and in [Receipts](receipt.mdx)

<Tabs>
<TabItem value="rust" label="Rust">
Expand Down
104 changes: 104 additions & 0 deletions docs/bos/queryapi/big-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
id: big-query
title: BigQuery Public Dataset
sidebar_label: BigQuery
---

Blockchain data indexing in NEAR Public Lakehouse is for anyone wanting to understand blockchain data. This includes:

- **Users**: create queries to track NEAR assets, monitor transactions, or analyze on-chain events at a massive scale.
- **Researchers**: use indexed data for data science tasks, including on-chain activities, identifying trends, or feeding AI/ML pipelines for predictive analysis.
- **Startups**: can use NEAR's indexed data for deep insights on user engagement, smart contract utilization, or insights across tokens and NFT adoption.

Benefits:

- **NEAR instant insights**: Historical on-chain data queried at scale.
- **Cost-effective**: eliminate the need to store and process bulk NEAR protocol data; query as little or as much data as preferred.
- **Easy to use**: no prior experience with blockchain technology is required; bring a general knowledge of SQL to unlock insights.


## Getting started

1. Login into your [Google Cloud Account](https://console.cloud.google.com/).
2. Open the [NEAR Protocol BigQuery Public Dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/crypto-near-mainnet).
3. Click in the <kbd>[VIEW DATASET](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=crypto_near_mainnet_us&page=dataset)</kbd> button.
4. Click in the <kbd>+</kbd> to create a new tab and write your query, click in the <kbd>RUN</kbd> button, and check the `Query results` below the query.
5. Done :)

:::info

The [NEAR Public Lakehouse repository](https://github.com/near/near-public-lakehouse) contains the source code for ingesting NEAR Protocol data stored as JSON files in AWS S3 by [NEAR Lake Indexer](https://github.com/near/near-lake-indexer).

:::

### Example Queries

- _How many unique users do I have for my smart contract per day?_

```sql
SELECT
r.block_date collected_for_day,
COUNT(DISTINCT r.transaction_signer_account_id)
FROM `bigquery-public-data.crypto_near_mainnet_us.receipt_actions` ra
INNER JOIN `bigquery-public-data.crypto_near_mainnet_us.receipts` r ON r.receipt_id = ra.receipt_id
WHERE ra.action_kind = 'FUNCTION_CALL'
AND ra.receipt_receiver_account_id = 'near.social' -- change to your contract
GROUP BY 1
ORDER BY 1 DESC;
```

## How much it costs?

- NEAR pays for the storage and doesn't charge you to use the public dataset.
> To learn more about BigQuery public datasets [check this page](https://cloud.google.com/bigquery/public-data).
- Google GCP charges for the queries that you perform on the data. For example, in today's price "Sep 1st, 2023" the On-demand (per TB) query pricing is $6.25 per TB where the first 1 TB per month is free.
> Check [Google's pricing page](https://cloud.google.com/bigquery/pricing#analysis_pricing_models) for detailed pricing info, options, and best practices.
:::tip
You can check how much data it will query before running it in the BigQuery console UI. Again, since BigQuery uses a columnar data structure and partitions, it's recommended to select only the columns and partitions (`block_date`) needed to avoid unnecessary query costs.
:::

![Query Costs](/docs/BQ_Query_Cost.png "BQ Query Costs")

## Architecture

The data is loaded in a streaming fashion using [Databricks Autoloader](https://docs.gcp.databricks.com/ingestion/auto-loader/index.html) into raw/bronze tables, and transformed with [Databricks Delta Live Tables](https://www.databricks.com/product/delta-live-tables) streaming jobs into cleaned/enriched/silver tables.

The silver tables are also copied into the [GCP BigQuery Public Dataset](https://cloud.google.com/bigquery/public-data).

![Architecture](/docs/Architecture.png "Architecture")

:::info

[Databricks Medallion Architecture](https://www.databricks.com/glossary/medallion-architecture).

:::

## Available Data

The current data that NEAR is providing was inspired by [NEAR Indexer for Explorer](https://github.com/near/near-indexer-for-explorer/).

:::info
NEAR plans to improve the data available in the NEAR Public Lakehouse making it easier to consume by denormalizing some tables.
:::

The tables available in the NEAR Public Lakehouse are:

- **blocks**: A structure that represents an entire block in the NEAR blockchain. `Block` is the main entity in NEAR Protocol blockchain. Blocks are produced in NEAR Protocol every second.
- **chunks**: A structure that represents a chunk in the NEAR blockchain. `Chunk` of a `Block` is a part of a `Block` from a `Shard`. The collection of `Chunks` of the `Block` forms the NEAR Protocol Block. `Chunk` contains all the structures that make the `Block`: `Transactions`, [`Receipts`](https://nomicon.io/RuntimeSpec/Receipts), and `Chunk Header`.
- **transactions**: [`Transaction`](../../2.develop/lake/structures/transaction.mdx#definition) is the main way of interaction between a user and a blockchain. Transaction contains: Signer account ID, Receiver account ID, and Actions.
- **execution_outcomes**: Execution outcome is the result of execution of `Transaction` or `Receipt`. In the result of the Transaction execution will always be a Receipt.
- **receipt_details**: All cross-contract (we assume that each account lives in its own shard) communication in Near happens through Receipts. Receipts are stateful in a sense that they serve not only as messages between accounts but also can be stored in the account storage to await `DataReceipts`. Each receipt has a `predecessor_id` (who sent it) and `receiver_id` the current account.
- **receipt_origin**: Tracks the transaction that originated the receipt.
- **receipt_actions**: Action Receipt represents a request to apply actions on the `receiver_id` side. It could be derived as a result of a `Transaction` execution or another `ACTION` Receipt processing. Action kind can be: `ADD_KEY`, `CREATE_ACCOUNT`, `DELEGATE_ACTION`, `DELETE_ACCOUNT`, `DELETE_KEY`, `DEPLOY_CONTRACT`, `FUNCTION_CALL`, `STAKE`, `TRANSFER`.
- **receipts (view)**: It's recommended to select only the columns and partitions (`block_date`) needed to avoid unnecessary query costs. This view join the receipt details, the transaction that originated the receipt and the receipt execution outcome.
- **account_changes**: Each account has an associated state where it stores its metadata and all the contract-related data (contract's code + storage).

:::note References

- [Protocol documentation](../../1.concepts/welcome.md)
- [Near Data flow](../../1.concepts/data-flow/near-data-flow.md)
- [Lake Data structures](../../2.develop/lake/structures/toc.mdx)
- [Protocol specification](https://nomicon.io/)

:::
1 change: 1 addition & 0 deletions website/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@
},
{
"Data Analytics": [
"bos/queryapi/big-query",
"tools/indexer-for-explorer"
]
},
Expand Down
Binary file added website/static/docs/Architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/static/docs/BQ_Query_Cost.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1ce6541

Please sign in to comment.