Skip to content

Commit

Permalink
Merge branch 'main' into flow-user-guide
Browse files Browse the repository at this point in the history
  • Loading branch information
nicecui committed May 20, 2024
2 parents bb3fdac + f7b806b commit 909cf97
Show file tree
Hide file tree
Showing 189 changed files with 1,015 additions and 319 deletions.
10 changes: 0 additions & 10 deletions .github/pr-title-checker-config.json

This file was deleted.

21 changes: 21 additions & 0 deletions .github/workflows/linter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Linters

on:
push:
branches:
- main
pull_request:

jobs:
linter:
name: Run Linters
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Markdown lint
uses: nosborn/github-action-markdown-cli@v3.2.0
with:
files: "./**/*.md"
config_file: .markdownlint.yaml
- name: Check typos
uses: crate-ci/typos@master
18 changes: 0 additions & 18 deletions .github/workflows/mdlint.yml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ jobs:
# See supported Node.js release schedule at https://nodejs.org/en/about/releases/

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install pnpm
uses: pnpm/action-setup@v2
uses: pnpm/action-setup@v4
with:
version: 8.6.0
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: 'pnpm'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
name: "PR Title Checker"
name: Semantic Pull Request

on:
pull_request_target:
types:
Expand All @@ -10,11 +11,10 @@ on:

jobs:
check:
name: Check pull request title
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: thehanimo/pr-title-checker@v1.3.4
with:
- uses: amannn/action-semantic-pull-request@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
pass_on_octokit_error: false
configuration_path: ".github/pr-title-checker-config.json"
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ pids
*.seed
*.pid.lock

# tmp files
*.svg.bkp

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

Expand Down
1 change: 0 additions & 1 deletion docs/auto-imports.d.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
/* eslint-disable */
/* prettier-ignore */
// @ts-nocheck
// noinspection JSUnusedGlobalSymbols
// Generated by unplugin-auto-import
export {}
declare global {
Expand Down
15 changes: 15 additions & 0 deletions docs/nightly/en/contributor-guide/flownode/arrangement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Arrangement

Arrangement stores the state in the dataflow's process. It stores the streams of update flows for further querying and updating.

The arrangement essentially stores key-value pairs with timestamps to mark their change time.

Internally, the arrangement receives tuples like
`((Key Row, Value Row), timestamp, diff)` and stores them in memory. One can query key-value pairs at a certain time using the `get(now: Timestamp, key: Row)` method.
The arrangement also assumes that everything older than a certain time (also known as the low watermark) has already been ingested to the sink tables and does not keep a history for them.

:::tip NOTE

The arrangement allows for the removal of keys by setting the `diff` to -1 in incoming tuples. Moreover, if a row has been previously added to the arrangement and the same key is inserted with a different value, the original value is overwritten with the new value.

:::
12 changes: 12 additions & 0 deletions docs/nightly/en/contributor-guide/flownode/dataflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Dataflow

The `dataflow` module (see `flow::compute` module) is the core computing module of `flow`.
It takes a SQL query and transforms it into flow's internal execution plan.
This execution plan is then rendered into an actual dataflow, which is essentially a directed acyclic graph (DAG) of functions with input and output ports.
The dataflow is triggered to run when needed.

Currently, this dataflow only supports `map` and `reduce` operations. Support for `join` operations will be added in the future.

Internally, the dataflow handles data in row format, using a tuple `(row, time, diff)`. Here, `row` represents the actual data being passed, which may contain multiple `Value` objects.
`time` is the system time which tracks the progress of the dataflow, and `diff` typically represents the insertion or deletion of the row (+1 or -1).
Therefore, the tuple represents the insert/delete operation of the `row` at a given system `time`.
17 changes: 17 additions & 0 deletions docs/nightly/en/contributor-guide/flownode/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Overview

## Introduction


`Flownode` provides a simple streaming process (known as `flow`) ability to the database.
`Flownode` manages `flows` which are tasks that receive data from the `source` and send data to the `sink`.

In current version, `Flownode` only supports standalone mode. In the future, we will support distributed mode.

## Components

A `Flownode` contains all the components needed for the streaming process of a flow. Here we list the vital parts:

- A `FlownodeManager` for receiving inserts forwarded from the `Frontend` and sending back results for the flow's sink table.
- A certain number of `FlowWorker` instances, each running in a separate thread. Currently for standalone mode, there is only one flow worker, but this may change in the future.
- A `Flow` is a task that actively receives data from the `source` and sends data to the `sink`. It is managed by the `FlownodeManager` and run by a `FlowWorker`.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Planner will traverse the input logical plan, and split it into multiple stages
rule](https://github.com/GreptimeTeam/greptimedb/blob/main/docs/rfcs/2023-05-09-distributed-planner.md)".

This rule is under heavy development. At present it will consider things like:
- whether the operator ifself is commutative
- whether the operator itself is commutative
- how the partition rule is configured
- etc...

Expand Down
2 changes: 1 addition & 1 deletion docs/nightly/en/db-cloud-shared/quick-start/influxdb.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@

To quickly get started with InfluxDB line protocol, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB. The source code is avaliable on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-influxdb-line-protocol).
To quickly get started with InfluxDB line protocol, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB. The source code is available on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-influxdb-line-protocol).
2 changes: 1 addition & 1 deletion docs/nightly/en/db-cloud-shared/quick-start/mysql.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@

To quickly get started with MySQL, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB via MySQL CLI. The source code is avaliable on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-mysql).
To quickly get started with MySQL, we can use Bash to collect system metrics, such as CPU and memory usage, and send it to GreptimeDB via MySQL CLI. The source code is available on [GitHub](https://github.com/GreptimeCloudStarters/quick-start-mysql).

Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ go get go.opentelemetry.io/otel@v1.16.0 \
```

Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB in `app.go`.
For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).

```go
auth := base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%s:%s", *username, *password)))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies {
}
```

Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB. For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
Once the required packages are installed, write the code to create a metric export object that sends metrics to GreptimeDB. For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).

```java
String endpoint = String.format("https://%s/v1/otlp/v1/metrics", dbHost);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ npm install @opentelemetry/api@1.4.1 \
```

Once the required packages are installed,create a new file named `app.ts` and write the code to create a metric export object that sends metrics to GreptimeDB.
For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).

```ts
const exporter = new OTLPMetricExporter({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ remote_write:
password: <password>
```
The configuration file above configures Prometheus to scrape metrics from the node exporter and send them to GreptimeDB. For the configration about `<host>`, `<dbname>`, `<username>`, and `<password>`, please refer to the Prometheus documentation in [GreptimeDB](/user-guide/clients/prometheus.md) or [GreptimeCloud](/greptimecloud/integrations/prometheus/quick-setup.md).
The configuration file above configures Prometheus to scrape metrics from the node exporter and send them to GreptimeDB. For the configuration about `<host>`, `<dbname>`, `<username>`, and `<password>`, please refer to the Prometheus documentation in [GreptimeDB](/user-guide/clients/prometheus.md) or [GreptimeCloud](/greptimecloud/integrations/prometheus/quick-setup.md).

Finally, start the containers:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pip install -r requirements.txt
```

Once the required packages are installed,create a new file named `main.py` and write the code to create a metric export object that sends metrics to GreptimeDB.
For the configration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).
For the configuration about the exporter, please refer to OTLP integration documentation in [GreptimeDB](/user-guide/clients/otlp.md) or [GreptimeCloud](/greptimecloud/integrations/otlp.md).

```python
from opentelemetry import metrics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Visualization plays a crucial role in effectively utilizing time series data. To help users leverage the various features of GreptimeDB, Greptime offers a simple [dashboard](https://github.com/GreptimeTeam/dashboard).

The Dashboard is embedded into GreptimeDB's binary since GreptimeDB v0.2.0. After starting [GreptimeDB Standalone](greptimedb-standalone.md) or [GreptimeDB Cluster](greptimedb-cluster.md), the dashboard can be accessed via the HTTP endpoint `http://localhost:4000/dashboard`. The dashboard supports mutiple query languages, including [SQL queries](/user-guide/query-data/sql.md), [Python Scripts](/user-guide/python-scripts/overview.md), and [PromQL queries](/user-guide/query-data/promql.md).
The Dashboard is embedded into GreptimeDB's binary since GreptimeDB v0.2.0. After starting [GreptimeDB Standalone](greptimedb-standalone.md) or [GreptimeDB Cluster](greptimedb-cluster.md), the dashboard can be accessed via the HTTP endpoint `http://localhost:4000/dashboard`. The dashboard supports multiple query languages, including [SQL queries](/user-guide/query-data/sql.md), [Python Scripts](/user-guide/python-scripts/overview.md), and [PromQL queries](/user-guide/query-data/promql.md).

We offer various chart types to choose from based on different scenarios. The charts become more informative when you have sufficient data.

Expand Down
6 changes: 3 additions & 3 deletions docs/nightly/en/reference/command-lines.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Starts GreptimeDB in standalone mode with customized configurations:
greptime --log-dir=/tmp/greptimedb/logs --log-level=info standalone start -c config/standalone.example.toml
```

The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuraiton files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/operations/configuration.md).
The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuration files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/operations/configuration.md).

To start GreptimeDB in distributed mode, you need to start each component separately. The following commands show how to start each component with customized configurations or command line arguments.

Expand All @@ -58,7 +58,7 @@ greptime datanode start -c config/datanode.example.toml
Starts a datanode instance with command line arguments specifying the gRPC service address, the MySQL service address, the address of the metasrv, and the node id of the instance:

```sh
greptime datanode start --rpc-addr=0.0.0.0:4001 --mysql-addr=0.0.0.0:4002 --metasrv-addr=0.0.0.0:3002 --node-id=1
greptime datanode start --rpc-addr=0.0.0.0:4001 --mysql-addr=0.0.0.0:4002 --metasrv-addrs=0.0.0.0:3002 --node-id=1
```

Starts a frontend instance with customized configurations:
Expand All @@ -70,5 +70,5 @@ greptime frontend start -c config/frontend.example.toml
Starts a frontend instance with command line arguments specifying the address of the metasrv:

```sh
greptime frontend start --metasrv-addr=0.0.0.0:3002
greptime frontend start --metasrv-addrs=0.0.0.0:3002
```
8 changes: 6 additions & 2 deletions docs/nightly/en/reference/sql/copy.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ FROM { '<path>/[<filename>]' }
[ PATTERN = '<regex_pattern>' ]
)
]
[LIMIT NUM]
```

The command starts with the keyword `COPY`, followed by the name of the table you want to import data into.
Expand All @@ -70,7 +71,6 @@ COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
|---|---|---|
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** |
| `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional |
| `MAX_INSERT_ROWS` | Maximum number of rows for insertion. e.g., `1000` | Optional |

#### `CONNECTION` Option

Expand Down Expand Up @@ -119,6 +119,10 @@ You can set the following **CONNECTION** options:
| `ENABLE_VIRTUAL_HOST_STYLE` | If you use virtual hosting to address the bucket, set it to "true".| Optional |
| `SESSION_TOKEN` | Your temporary credential for connecting the AWS S3 service. | Optional |

#### LIMIT

You can use `LIMIT` to restrict maximum number of rows inserted at once.

## COPY DATABASE

Beside copying specific table to/from some path, `COPY` statement can also be used to copy whole database to/from some path. The syntax for copying databases is:
Expand All @@ -145,7 +149,7 @@ COPY DATABASE <db_name>
| `FORMAT` | Export file format, available options: JSON, CSV, Parquet | **Required** |
| `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional |

> - When copying databses, `<PATH>` must end with `/`.
> - When copying databases, `<PATH>` must end with `/`.
> - `CONNECTION` parameters can also be used to copying databases to/from object storage services like AWS S3.
### Examples
Expand Down
9 changes: 4 additions & 5 deletions docs/nightly/en/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
...
[TIME INDEX (column)],
[PRIMARY KEY(column1, column2, ...)]
) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
) ENGINE = engine WITH([TTL | storage | ...] = expr, ...)
[
PARTITION ON COLUMNS(column1, column2, ...) (
<PARTITION EXPR>,
Expand Down Expand Up @@ -84,7 +84,6 @@ Users can add table options by using `WITH`. The valid options contain the follo
| Option | Description | Value |
| ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` |
| `regions` | The region number of the table | Integer value, such as 1, 5, 10 etc. |
| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |
| `compaction.type` | Compaction strategy of the table | String value. Only `twcs` is allowed. |
| `compaction.twcs.max_active_window_files` | Max num of files that can be kept in active writing time window | String value, such as '8'. Only available when `compaction.type` is `twcs`. You can refer to this [document](https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html) to learn more about the `twcs` compaction strategy. |
Expand All @@ -93,13 +92,13 @@ Users can add table options by using `WITH`. The valid options contain the follo
| `memtable.type` | Type of the memtable. | String value, supports `time_series`, `partition_tree`. |
| `append_mode` | Whether the table is append-only | String value. Default is 'false', which removes duplicate rows by primary keys and timestamps. Setting it to 'true' to enable append mode and create an append-only table which keeps duplicate rows. |

For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10:
For example, to create a table with the storage data TTL(Time-To-Live) is seven days:

```sql
CREATE TABLE IF NOT EXISTS temperatures(
ts TIMESTAMP TIME INDEX,
temperature DOUBLE DEFAULT 10,
) engine=mito with(ttl='7d', regions=10);
) engine=mito with(ttl='7d');
```

Create a table that stores the data in Google Cloud Storage:
Expand All @@ -108,7 +107,7 @@ Create a table that stores the data in Google Cloud Storage:
CREATE TABLE IF NOT EXISTS temperatures(
ts TIMESTAMP TIME INDEX,
temperature DOUBLE DEFAULT 10,
) engine=mito with(ttl='7d', regions=10, storage="Gcs");
) engine=mito with(ttl='7d', storage="Gcs");
```

Create a table with custom compaction options. The table will attempt to partition data into 1-day time window based on the timestamps of the data.
Expand Down
Loading

0 comments on commit 909cf97

Please sign in to comment.