Merge branch 'main' into flow-user-guide

GreptimeTeam · May 20, 2024 · 909cf97 · 909cf97
2 parents bb3fdac + f7b806b
commit 909cf97
Show file tree

Hide file tree

Showing 189 changed files with 1,015 additions and 319 deletions.
diff --git a/.github/pr-title-checker-config.json b/.github/pr-title-checker-config.json
diff --git a/.github/workflows/linter.yml b/.github/workflows/linter.yml
@@ -0,0 +1,21 @@
+name: Linters
+
+on:
+  push:
+    branches:
+    - main
+  pull_request:
+
+jobs:
+  linter:
+    name: Run Linters
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - name: Markdown lint
+      uses: nosborn/github-action-markdown-cli@v3.2.0
+      with:
+        files: "./**/*.md"
+        config_file: .markdownlint.yaml
+    - name: Check typos
+      uses: crate-ci/typos@master
diff --git a/.github/workflows/mdlint.yml b/.github/workflows/mdlint.yml
diff --git a/.github/workflows/node.js.yml → .github/workflows/node-build.yml b/.github/workflows/node.js.yml → .github/workflows/node-build.yml
@@ -17,13 +17,13 @@ jobs:
         # See supported Node.js release schedule at https://nodejs.org/en/about/releases/
 
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
       - name: Install pnpm
-        uses: pnpm/action-setup@v2
+        uses: pnpm/action-setup@v4
         with:
           version: 8.6.0
       - name: Use Node.js ${{ matrix.node-version }}
-        uses: actions/setup-node@v3
+        uses: actions/setup-node@v4
         with:
           node-version: ${{ matrix.node-version }}
           cache: 'pnpm'

diff --git a/.github/workflows/pr-title-checker.yml → .github/workflows/semantic-pull-request.yml b/.github/workflows/pr-title-checker.yml → .github/workflows/semantic-pull-request.yml
@@ -1,4 +1,5 @@
-name: "PR Title Checker"
+name: Semantic Pull Request
+
 on:
   pull_request_target:
     types:
@@ -10,11 +11,10 @@ on:
 
 jobs:
   check:
+    name: Check pull request title
     runs-on: ubuntu-latest
     timeout-minutes: 10
     steps:
-      - uses: thehanimo/pr-title-checker@v1.3.4
-        with:
+      - uses: amannn/action-semantic-pull-request@v5
+        env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          pass_on_octokit_error: false
-          configuration_path: ".github/pr-title-checker-config.json"
diff --git a/.gitignore b/.gitignore
@@ -15,6 +15,9 @@ pids
 *.seed
 *.pid.lock
 
+# tmp files
+*.svg.bkp
+
 # Directory for instrumented libs generated by jscoverage/JSCover
 lib-cov
 

diff --git a/docs/auto-imports.d.ts b/docs/auto-imports.d.ts
@@ -1,7 +1,6 @@
 /* eslint-disable */
 /* prettier-ignore */
 // @ts-nocheck
-// noinspection JSUnusedGlobalSymbols
 // Generated by unplugin-auto-import
 export {}
 declare global {

diff --git a/docs/nightly/en/contributor-guide/flownode/arrangement.md b/docs/nightly/en/contributor-guide/flownode/arrangement.md
@@ -0,0 +1,15 @@
+# Arrangement
+
+Arrangement stores the state in the dataflow's process. It stores the streams of update flows for further querying and updating.
+
+The arrangement essentially stores key-value pairs with timestamps to mark their change time.
+
+Internally, the arrangement receives tuples like
+`((Key Row, Value Row), timestamp, diff)` and stores them in memory. One can query key-value pairs at a certain time using the `get(now: Timestamp, key: Row)` method.
+The arrangement also assumes that everything older than a certain time (also known as the low watermark) has already been ingested to the sink tables and does not keep a history for them.
+
+:::tip NOTE
+
+The arrangement allows for the removal of keys by setting the `diff` to -1 in incoming tuples. Moreover, if a row has been previously added to the arrangement and the same key is inserted with a different value, the original value is overwritten with the new value.
+
+:::
diff --git a/docs/nightly/en/contributor-guide/flownode/dataflow.md b/docs/nightly/en/contributor-guide/flownode/dataflow.md
@@ -0,0 +1,12 @@
+# Dataflow
+
+The `dataflow` module (see `flow::compute` module) is the core computing module of `flow`.
+It takes a SQL query and transforms it into flow's internal execution plan.
+This execution plan is then rendered into an actual dataflow, which is essentially a directed acyclic graph (DAG) of functions with input and output ports.
+The dataflow is triggered to run when needed.
+
+Currently, this dataflow only supports `map` and `reduce` operations. Support for `join` operations will be added in the future.
+
+Internally, the dataflow handles data in row format, using a tuple `(row, time, diff)`. Here, `row` represents the actual data being passed, which may contain multiple `Value` objects.
+`time` is the system time which tracks the progress of the dataflow, and `diff` typically represents the insertion or deletion of the row (+1 or -1).
+Therefore, the tuple represents the insert/delete operation of the `row` at a given system `time`.
diff --git a/docs/nightly/en/contributor-guide/flownode/overview.md b/docs/nightly/en/contributor-guide/flownode/overview.md
@@ -0,0 +1,17 @@
+# Overview
+
+## Introduction
+
+
+`Flownode` provides a simple streaming process (known as `flow`) ability to the database. 
+`Flownode` manages `flows` which are tasks that receive data from the `source` and send data to the `sink`.
+
+In current version, `Flownode` only supports standalone mode. In the future, we will support distributed mode.
+
+## Components
+
+A `Flownode` contains all the components needed for the streaming process of a flow. Here we list the vital parts:
+
+- A `FlownodeManager` for receiving inserts forwarded from the `Frontend` and sending back results for the flow's sink table.
+- A certain number of `FlowWorker` instances, each running in a separate thread. Currently for standalone mode, there is only one flow worker, but this may change in the future.
+- A `Flow` is a task that actively receives data from the `source` and sends data to the `sink`. It is managed by the `FlownodeManager` and run by a `FlowWorker`.
diff --git a/docs/nightly/en/contributor-guide/frontend/distributed-querying.md b/docs/nightly/en/contributor-guide/frontend/distributed-querying.md
@@ -15,7 +15,7 @@ Planner will traverse the input logical plan, and split it into multiple stages
 rule](https://github.com/GreptimeTeam/greptimedb/blob/main/docs/rfcs/2023-05-09-distributed-planner.md)".
 
 This rule is under heavy development. At present it will consider things like:
-- whether the operator ifself is commutative
+- whether the operator itself is commutative
 - how the partition rule is configured
 - etc...
 

diff --git a/docs/nightly/en/db-cloud-shared/quick-start/influxdb.md b/docs/nightly/en/db-cloud-shared/quick-start/influxdb.md
@@ -1,2 +1,2 @@
 
-To quickly get started with InfluxDB line protocol, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB. The source code is avaliable on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-influxdb-line-protocol).
+To quickly get started with InfluxDB line protocol, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB. The source code is available on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-influxdb-line-protocol).
diff --git a/docs/nightly/en/db-cloud-shared/quick-start/mysql.md b/docs/nightly/en/db-cloud-shared/quick-start/mysql.md
@@ -1,3 +1,3 @@
 
-To quickly get started with MySQL, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB via MySQL CLI. The source code is avaliable on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-mysql).
+To quickly get started with MySQL, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB via MySQL CLI. The source code is available on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-mysql).
 
diff --git a/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/go-demo.md b/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/go-demo.md
@@ -11,7 +11,7 @@ go get go.opentelemetry.io/otel@v1.16.0 \
 ```
 
 Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB in `app.go`.
-For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
+For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
 
 ```go
 auth := base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%s:%s", *username, *password)))

diff --git a/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/java-demo.md b/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/java-demo.md
@@ -21,7 +21,7 @@ dependencies {
 }
 ```
 
-Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB. For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
+Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB. For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
 
 ```java
 String endpoint = String.format("https://%s/v1/otlp/v1/metrics", dbHost);

diff --git a/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/node-js-demo.md b/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/node-js-demo.md
@@ -25,7 +25,7 @@ npm install @opentelemetry/api@1.4.1 \
 ```
 
 Once the required packages are installed,create a new file named `app.ts` and write the code to create a metric export object that sends metrics to GreptimeDB.
-For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
+For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
 
 ```ts
 const exporter = new OTLPMetricExporter({

diff --git a/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/prometheus-demo.md b/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/prometheus-demo.md
@@ -53,7 +53,7 @@ remote_write:
       password: <password>
 ```
 
-The configuration file above configures Prometheus to scrape metrics from the node exporter and send them to GreptimeDB. For the configration about `<host>`, `<dbname>`, `<username>`, and `<password>`, please refer to the Prometheus documentation in [GreptimeDB](/user-guide/clients/prometheus.md) or [GreptimeCloud](/greptimecloud/integrations/prometheus/quick-setup.md).
+The configuration file above configures Prometheus to scrape metrics from the node exporter and send them to GreptimeDB. For the configuration about `<host>`, `<dbname>`, `<username>`, and `<password>`, please refer to the Prometheus documentation in [GreptimeDB](/user-guide/clients/prometheus.md) or [GreptimeCloud](/greptimecloud/integrations/prometheus/quick-setup.md).
 
 Finally, start the containers:
 

diff --git a/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/python-demo.md b/docs/nightly/en/db-cloud-shared/tutorials/monitor-host-metrics/python-demo.md
@@ -25,7 +25,7 @@ pip install -r requirements.txt
 ```
 
 Once the required packages are installed,create a new file named `main.py` and write the code to create a metric export object that sends metrics to GreptimeDB.
-For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
+For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
 
 ```python
 from opentelemetry import metrics

diff --git a/docs/nightly/en/getting-started/installation/greptimedb-dashboard.md b/docs/nightly/en/getting-started/installation/greptimedb-dashboard.md
@@ -2,7 +2,7 @@
 
 Visualization plays a crucial role in effectively utilizing time series data. To help users leverage the various features of GreptimeDB, Greptime offers a simple [dashboard](https://github.com/GreptimeTeam/dashboard).
 
-The Dashboard is embedded into GreptimeDB's binary since GreptimeDB v0.2.0. After starting [GreptimeDB Standalone](greptimedb-standalone.md) or [GreptimeDB Cluster](greptimedb-cluster.md), the dashboard can be accessed via the HTTP endpoint `http://localhost:4000/dashboard`. The dashboard supports mutiple query languages, including [SQL queries](/user-guide/query-data/sql.md), [Python Scripts](/user-guide/python-scripts/overview.md), and [PromQL queries](/user-guide/query-data/promql.md).
+The Dashboard is embedded into GreptimeDB's binary since GreptimeDB v0.2.0. After starting [GreptimeDB Standalone](greptimedb-standalone.md) or [GreptimeDB Cluster](greptimedb-cluster.md), the dashboard can be accessed via the HTTP endpoint `http://localhost:4000/dashboard`. The dashboard supports multiple query languages, including [SQL queries](/user-guide/query-data/sql.md), [Python Scripts](/user-guide/python-scripts/overview.md), and [PromQL queries](/user-guide/query-data/promql.md).
 
 We offer various chart types to choose from based on different scenarios. The charts become more informative when you have sufficient data.
 

diff --git a/docs/nightly/en/reference/command-lines.md b/docs/nightly/en/reference/command-lines.md
@@ -39,7 +39,7 @@ Starts GreptimeDB in standalone mode with customized configurations:
 greptime --log-dir=/tmp/greptimedb/logs --log-level=info standalone start -c config/standalone.example.toml
 ```
 
-The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuraiton files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/operations/configuration.md).
+The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuration files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/operations/configuration.md).
 
 To start GreptimeDB in distributed mode, you need to start each component separately. The following commands show how to start each component with customized configurations or command line arguments.
 
@@ -58,7 +58,7 @@ greptime datanode start -c config/datanode.example.toml
 Starts a datanode instance with command line arguments specifying the gRPC service address, the MySQL service address, the address of the metasrv, and the node id of the instance:
 
 ```sh
-greptime datanode start --rpc-addr=0.0.0.0:4001 --mysql-addr=0.0.0.0:4002 --metasrv-addr=0.0.0.0:3002 --node-id=1
+greptime datanode start --rpc-addr=0.0.0.0:4001 --mysql-addr=0.0.0.0:4002 --metasrv-addrs=0.0.0.0:3002 --node-id=1
 ```
 
 Starts a frontend instance with customized configurations:
@@ -70,5 +70,5 @@ greptime frontend start -c config/frontend.example.toml
 Starts a frontend instance with command line arguments specifying the address of the metasrv:
 
 ```sh
-greptime frontend start --metasrv-addr=0.0.0.0:3002
+greptime frontend start --metasrv-addrs=0.0.0.0:3002
 ```
diff --git a/docs/nightly/en/reference/sql/copy.md b/docs/nightly/en/reference/sql/copy.md
@@ -44,6 +44,7 @@ FROM { '<path>/[<filename>]' }
    [ PATTERN = '<regex_pattern>' ]
  )
 ]
+[LIMIT NUM]
 ```
 
 The command starts with the keyword `COPY`, followed by the name of the table you want to import data into.
@@ -70,7 +71,6 @@ COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
 |---|---|---|
 | `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet  | **Required** |
 | `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional |
-| `MAX_INSERT_ROWS` | Maximum number of rows for insertion. e.g., `1000` | Optional |
 
 #### `CONNECTION` Option
 
@@ -119,6 +119,10 @@ You can set the following **CONNECTION** options:
 | `ENABLE_VIRTUAL_HOST_STYLE` | If you use virtual hosting to address the bucket, set it to "true".| Optional |
 | `SESSION_TOKEN` | Your temporary credential for connecting the AWS S3 service. | Optional |
 
+#### LIMIT
+
+You can use `LIMIT` to restrict maximum number of rows inserted at once.
+
 ## COPY DATABASE
 
 Beside copying specific table to/from some path, `COPY` statement can also be used to copy whole database to/from some path. The syntax for copying databases is:
@@ -145,7 +149,7 @@ COPY DATABASE <db_name>
 | `FORMAT` | Export file format, available options: JSON, CSV, Parquet  | **Required** |
 | `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional |
 
-> - When copying databses, `<PATH>` must end with `/`.
+> - When copying databases, `<PATH>` must end with `/`.
 > - `CONNECTION` parameters can also be used to copying databases to/from object storage services like AWS S3.
 
 ### Examples

diff --git a/docs/nightly/en/reference/sql/create.md b/docs/nightly/en/reference/sql/create.md
@@ -50,7 +50,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
     ...
     [TIME INDEX (column)],
     [PRIMARY KEY(column1, column2, ...)]
-) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
+) ENGINE = engine WITH([TTL | storage | ...] = expr, ...)
 [
   PARTITION ON COLUMNS(column1, column2, ...) (
     <PARTITION EXPR>,
@@ -84,7 +84,6 @@ Users can add table options by using `WITH`. The valid options contain the follo
 | Option              | Description                                   | Value                                                                                                                                                                        |
 | ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `ttl`               | The storage time of the table data            | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d`                                                 |
-| `regions`           | The region number of the table                | Integer value, such as 1, 5, 10 etc.                                                                                                                                         |
 | `storage`           | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |
 | `compaction.type` | Compaction strategy of the table         | String value. Only `twcs` is allowed. |
 | `compaction.twcs.max_active_window_files` | Max num of files that can be kept in active writing time window         | String value, such as '8'. Only available when `compaction.type` is `twcs`. You can refer to this [document](https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html) to learn more about the `twcs` compaction strategy. |
@@ -93,13 +92,13 @@ Users can add table options by using `WITH`. The valid options contain the follo
 | `memtable.type` | Type of the memtable.         | String value, supports `time_series`, `partition_tree`. |
 | `append_mode`           | Whether the table is append-only     | String value. Default is 'false', which removes duplicate rows by primary keys and timestamps. Setting it to 'true' to enable append mode and create an append-only table which keeps duplicate rows.     |
 
-For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10:
+For example, to create a table with the storage data TTL(Time-To-Live) is seven days:
 
 ```sql
 CREATE TABLE IF NOT EXISTS temperatures(
   ts TIMESTAMP TIME INDEX,
   temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10);
+) engine=mito with(ttl='7d');
 ```
 
 Create a table that stores the data in Google Cloud Storage:
@@ -108,7 +107,7 @@ Create a table that stores the data in Google Cloud Storage:
 CREATE TABLE IF NOT EXISTS temperatures(
   ts TIMESTAMP TIME INDEX,
   temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10, storage="Gcs");
+) engine=mito with(ttl='7d', storage="Gcs");
 ```
 
 Create a table with custom compaction options. The table will attempt to partition data into 1-day time window based on the timestamps of the data.