docs: admin documents (#964)

GreptimeTeam · May 18, 2024 · 8077b6e · 8077b6e
1 parent b1f426c
commit 8077b6e
Show file tree

Hide file tree

Showing 16 changed files with 245 additions and 16 deletions.
diff --git a/docs/nightly/en/reference/sql/create.md b/docs/nightly/en/reference/sql/create.md
@@ -50,7 +50,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
  ...
  [TIME INDEX (column)],
  [PRIMARY KEY(column1, column2, ...)]
-) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
+) ENGINE = engine WITH([TTL | storage | ...] = expr, ...)
 [
  PARTITION ON COLUMNS(column1, column2, ...) (
  <PARTITION EXPR>,
@@ -92,13 +92,13 @@ Users can add table options by using `WITH`. The valid options contain the follo
 | `memtable.type` | Type of the memtable. | String value, supports `time_series`, `partition_tree`. |
 | `append_mode` | Whether the table is append-only | String value. Default is 'false', which removes duplicate rows by primary keys and timestamps. Setting it to 'true' to enable append mode and create an append-only table which keeps duplicate rows. |
 
-For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10:
+For example, to create a table with the storage data TTL(Time-To-Live) is seven days:
 
 ```sql
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10);
+) engine=mito with(ttl='7d');
 ```
 
 Create a table that stores the data in Google Cloud Storage:
@@ -107,7 +107,7 @@ Create a table that stores the data in Google Cloud Storage:
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10, storage="Gcs");
+) engine=mito with(ttl='7d', storage="Gcs");
 ```
 
 Create a table with custom compaction options. The table will attempt to partition data into 1-day time window based on the timestamps of the data.

diff --git a/docs/nightly/en/reference/sql/functions.md b/docs/nightly/en/reference/sql/functions.md
@@ -37,6 +37,27 @@ Where the `datatype` can be any valid Arrow data type in this [list](https://arr
 
 Please refer to [API documentation](https://greptimedb.rs/script/python/rspython/builtins/greptime_builtin/index.html#functions)
 
+### Admin Functions
+
+GreptimeDB provides some administration functions to manage the database and data:
+
+* `flush_table(table_name)` to flush a table's memtables into SST file by table name.
+* `flush_region(region_id)` to flush a region's memtables into SST file by region id. Find the region id through [REGION_PEERS](./information-schema/region-peers.md) table.
+* `compact_table(table_name)` to schedule a compaction task for a table by table name.
+* `compact_region(region_id)` to schedule a compaction task for a region by region id.
+* `migrate_region(region_id, from_peer, to_peer, [timeout])` to migrate regions between datanodes, please read the [Region Migration](/user-guide/operations/region-migration).
+* `procedure_state(procedure_id)` to query a procedure state by its id.
+
+For example:
+```sql
+-- Flush the table test --
+select flush_table("test");
+
+-- Schedule a compaction for table test --
+select compact_table("test");
+```
+
+
 ## Time and Date
 
 ### `date_trunc`

diff --git a/docs/nightly/en/summary.yml b/docs/nightly/en/summary.yml
@@ -66,8 +66,9 @@
  - api
  - cluster
  - Operations:
- # - overview
+ - admin
  - configuration
+ - back-up-&-restore-data
  - kubernetes
  - gtctl
  - run-on-android
@@ -81,7 +82,6 @@
  # - alert
  # - import-data
  # - export-data
- # - back-up-&-restore-data
  # - capacity-planning
  - upgrade
 - GreptimeCloud:

diff --git a/docs/nightly/en/user-guide/concepts/features-that-you-concern.md b/docs/nightly/en/user-guide/concepts/features-that-you-concern.md
@@ -16,7 +16,7 @@ Of course, you can set TTL for every table when creating it:
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10);
+) engine=mito with(ttl='7d');
 ```
 
 The TTL of temperatures is set to be seven days. 

diff --git a/docs/nightly/en/user-guide/operations/admin.md b/docs/nightly/en/user-guide/operations/admin.md
@@ -0,0 +1,30 @@
+# Administration
+
+This document addresses strategies and practices used in the operation of GreptimeDB systems and deployments.
+
+## Database/Cluster management
+
+* [Installation](/getting-started/installation/overview.md) for GreptimeDB and the [g-t-control](./gtctl.md) command line tool.
+* Database Configuration, please read the [Configuration](./configuration.md) reference.
+* [Monitoring](./monitoring.md) and [Tracing](./tracing.md) for GreptimeDB.
+* GreptimeDB [Backup & Restore methods](./back-up-\&-restore-data.md).
+
+### Runtime information
+
+* Find the topology information of the cluster though [CLUSTER_INFO](/reference/sql/information-schema/cluster-info.md) table.
+* Find the table regions distribution though [REGION_PEERS](/reference/sql/information-schema/region-peers.md) table.
+
+The `INFORMATION_SCHEMA` database provides access to system metadata, such as the name of a database or table, the data type of a column, etc. Please read the [reference](/reference/sql/information-schema/overview.md).
+
+## Data management
+
+* [The Storage Location](/user-guide/concepts/storage-location.md).
+* Cluster Failover for GreptimeDB by [Setting Remote WAL](./remote-wal/quick-start.md).
+* [Flush and Compaction for Table & Region](/reference/sql/functions#admin-functions).
+* Partition the table by regions, read the [Table Sharding](/contributor-guide/frontend/table-sharding.md) reference.
+ * [Migrate the Region](./region-migration.md) for Load Balance.
+* [Expire Data by Setting TTL](/user-guide/concepts/features-that-you-concern#can-i-set-ttl-or-retention-policy-for-different-tables-or-measurements).
+
+## Best Practices
+
+TODO
diff --git a/docs/nightly/en/user-guide/operations/back-up-&-restore-data.md b/docs/nightly/en/user-guide/operations/back-up-&-restore-data.md
@@ -1 +1,47 @@
 # Back up & restore data
+
+Use [`COPY` command](/reference/sql/copy.md)to backup and restore data.
+
+## Backup Table
+
+Backup the table `monitor` in `parquet` format to the file `/home/backup/monitor/monitor.parquet`:
+
+```sql
+COPY monitor TO '/home/backup/monitor/monitor.parquet' WITH (FORMAT = 'parquet');
+```
+
+Backup the data in the time range:
+
+```sql
+COPY monitor TO '/home/backup/monitor/monitor_20240518.parquet' WITH (FORMAT = 'parquet', START_TIME='2024-05-18 00:00:00', END_TIME='2025-05-19 00:00:00');
+```
+
+The above command will export the data on `2024-05-18`. Use such command to achieve incremental backup.
+
+## Restore Table
+
+Restore the `monitor` table:
+
+```sql
+COPY monitor FROM '/home/backup/monitor/monitor.parquet' WITH (FORMAT = 'parquet');
+```
+
+If exporting the data every data incrementally, all the files under the same folder but with different file names, we can restore them with `PATTERN` option:
+
+```sql
+COPY monitor FROM '/home/backup/monitor/` WITH (FORMAT = 'parquet', PATTERN = '.*parquet')
+```
+
+## Backup & Restore Database
+
+It's almost the same as the table:
+
+```sql
+-- Backup the database public --
+COPY DATABASE public TO '/home/backup/public/' WITH (FORMAT='parquet');
+
+-- Restore the database public --
+COPY DATABASE public FROM '/home/backup/public/' WITH (FORMAT='parquet');
+```
+
+Look at the folder `/home/backup/public/`, the command exports each table as a separate file.
diff --git a/docs/nightly/en/user-guide/operations/region-migration.md b/docs/nightly/en/user-guide/operations/region-migration.md
@@ -60,3 +60,20 @@ select migrate_region(region_id, from_peer_id, to_peer_id, replay_timeout);
 | `from_peer_id` | The peer id of the migration source(Datanode). | **Required** | |
 | `to_peer_id` | The peer id of the migration destination(Datanode). | **Required** | |
 | `replay_timeout` | The timeout(secs) of replay data. If the new Region fails to replay the data within the specified timeout, the migration will fail, however the data in the old Region will not be lost. | Optional | |
+
+## Query the migration state
+
+The `migrate_region` function returns the procedure id that executes the migration, queries the procedure state by it:
+
+```sql
+select procedure_state('538b7476-9f79-4e50-aa9c-b1de90710839')
+```
+
+If it's done, outputs the state in JSON:
+
+```json
+ {"status":"Done"}
+```
+
+Of course, you can confirm the region distribution by querying from `region_peers` and `partitions` in `information_schema`.
+
diff --git a/docs/nightly/en/user-guide/table-management.md b/docs/nightly/en/user-guide/table-management.md
@@ -220,7 +220,7 @@ Using the following code to create a table through POST method:
 curl -X POST \
  -H 'authorization: Basic {{authorization if exists}}' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
- -d 'sql=CREATE TABLE monitor (host STRING, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP(), cpu FLOAT64 DEFAULT 0, memory FLOAT64, TIME INDEX (ts), PRIMARY KEY(host)) ENGINE=mito WITH(regions=1)' \
+ -d 'sql=CREATE TABLE monitor (host STRING, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP(), cpu FLOAT64 DEFAULT 0, memory FLOAT64, TIME INDEX (ts), PRIMARY KEY(host)) ENGINE=mito' \
 http://localhost:4000/v1/sql?db=public
 ```
 

diff --git a/docs/nightly/zh/reference/sql/create.md b/docs/nightly/zh/reference/sql/create.md
@@ -50,7 +50,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
  ...
  [TIME INDEX (column)],
  [PRIMARY KEY(column1, column2, ...)]
-) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
+) ENGINE = engine WITH([TTL | storage | ...] = expr, ...)
 [
  PARTITION ON COLUMNS(column1, column2, ...) (
  <PARTITION EXPR>,
@@ -93,13 +93,13 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
 | `memtable.type` | memtable 的类型 | 字符串值，支持 `time_series`，`partition_tree` |
 | `append_mode` | 该表是否时 append-only 的 | 字符串值. 默认为 'false'，表示会根据主键和时间戳对数据去重。设置为 'true' 可以开启 append 模式和创建 append-only 表，保留所有重复的行 |
 
-例如，创建一个存储数据 TTL(Time-To-Live) 为七天，region 数为 10 的表：
+例如，创建一个存储数据 TTL(Time-To-Live) 为七天的表：
 
 ```sql
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10);
+) engine=mito with(ttl='7d');
 ```
 
 或者创建一个表单独将数据存储在 Google Cloud Storage 服务上：
@@ -108,7 +108,7 @@ CREATE TABLE IF NOT EXISTS temperatures(
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10, storage="Gcs");
+) engine=mito with(ttl='7d', storage="Gcs");
 ```
 
 创建带自定义 twcs compaction 参数的表。这个表会尝试根据数据的时间戳将数据按 1 天的时间窗口分区。

diff --git a/docs/nightly/zh/reference/sql/functions.md b/docs/nightly/zh/reference/sql/functions.md
@@ -36,6 +36,26 @@ arrow_cast(expression, datatype)
 
 请参考 [API 文档](https://greptimedb.rs/script/python/rspython/builtins/greptime_builtin/index.html#functions)
 
+### 管理函数
+
+GreptimeDB 提供了一些管理函数来管理数据库和数据：
+
+* `flush_table(table_name)` 通过表名将表的内存表刷写到 SST 文件。
+* `flush_region(region_id)` 通过 Region Id 将 Region 的内存表刷写到 SST 文件。可以通过 [REGION_PEERS](./information-schema/region-peers.md) 表查找 Region Id。
+* `compact_table(table_name)` 通过表名为表发起compaction 任务。
+* `compact_region(region_id)` 通过 Region Id 为 Region 发起 compaction 任务。
+* `migrate_region(region_id, from_peer, to_peer, [timeout])` 在 Datanode 之间迁移 Region，请阅读 [ Region迁移](/user-guide/operations/region-migration)。
+* `procedure_state(procedure_id)` 通过 Procedure Id 查询 Procedure 状态。
+
+例如：
+```sql
+-- 刷新表 test --
+select flush_table("test");
+
+-- 为表 test 启动一个 compaction 任务 --
+select compact_table("test");
+```
+
 ## Time and Date
 
 ### `date_trunc`

diff --git a/docs/nightly/zh/summary-i18n.yml b/docs/nightly/zh/summary-i18n.yml
@@ -25,5 +25,8 @@ Frontend: Frontend
 Datanode: Datanode
 Metasrv: Metasrv
 Reference: Reference
+Admin: 管理
+Administration: 管理
+back-up-&-restore-data: 备份和恢复
 SDK: SDK
 SQL: SQL
diff --git a/docs/nightly/zh/user-guide/concepts/features-that-you-concern.md b/docs/nightly/zh/user-guide/concepts/features-that-you-concern.md
@@ -16,7 +16,7 @@
 CREATE TABLE IF NOT EXISTS temperatures(
  ts TIMESTAMP TIME INDEX,
  temperature DOUBLE DEFAULT 10,
-) engine=mito with(ttl='7d', regions=10);
+) engine=mito with(ttl='7d');
 ```
 
 在上述 SQL 中 `temperatures` 表的 TTL 被设置为 7 天。

diff --git a/docs/nightly/zh/user-guide/operations/admin.md b/docs/nightly/zh/user-guide/operations/admin.md
@@ -0,0 +1,30 @@
+# 管理
+
+本文件介绍了在 GreptimeDB 系统运维和部署中使用的策略和实践。
+
+## 数据库/集群管理
+
+* GreptimeDB 的 [安装](/getting-started/installation/overview.md) 和 [g-t-control](./gtctl.md) 命令行工具。
+* 数据库配置，请阅读 [配置](./configuration.md) 参考。
+* GreptimeDB 的 [监控](./monitoring.md) 和 [链路追踪](./tracing.md)。
+* GreptimeDB 的 [备份与恢复方法](./back-up-\&-restore-data.md)。
+
+### 运行时信息
+
+* 通过 [CLUSTER_INFO](/reference/sql/information-schema/cluster-info.md) 表查找集群的拓扑信息。
+* 通过 [REGION_PEERS](/reference/sql/information-schema/region-peers.md) 表查找表的 Region 分布。
+
+`INFORMATION_SCHEMA` 数据库提供了对系统元数据的访问，如数据库或表的名称、列的数据类型等。请阅读 [参考文档](/reference/sql/information-schema/overview.md)。
+
+## 数据管理
+
+* [存储位置说明](/user-guide/concepts/storage-location.md)。
+* 通过 [设置Remote WAL](./remote-wal/quick-start.md) 实现 GreptimeDB 的集群容灾。
+* [Table 和 Region 的 Flush 和 Compaction](/reference/sql/functions#admin-functions)。
+* 通过 Region 对表进行分区，请阅读 [表的分片](./contributor-guide/frontend/table-sharding.md) 参考。
+ * [迁移 Region](./region-migration.md) 以实现负载均衡。
+* [通过设置 TTL 过期数据](/user-guide/concepts/features-that-you-concern#can-i-set-ttl-or-retention-policy-for-different-tables-or-measurements)。
+
+## 最佳实践
+
+TODO
diff --git a/docs/nightly/zh/user-guide/operations/back-up-&-restore-data.md b/docs/nightly/zh/user-guide/operations/back-up-&-restore-data.md
@@ -1 +1,47 @@
-TODO
+# 备份和恢复数据
+
+使用 [`COPY` 命令](/reference/sql/copy.md) 来备份和恢复数据。
+
+## 备份表
+
+将表 `monitor` 以 `parquet` 格式备份到文件 `/home/backup/monitor/monitor.parquet`：
+
+```sql
+COPY monitor TO '/home/backup/monitor/monitor.parquet' WITH (FORMAT = 'parquet');
+```
+
+备份指定时间范围内的数据：
+
+```sql
+COPY monitor TO '/home/backup/monitor/monitor_20240518.parquet' WITH (FORMAT = 'parquet', START_TIME='2024-05-18 00:00:00', END_TIME='2025-05-19 00:00:00');
+```
+
+上述命令将导出 `2024-05-18` 的数据。可以使用此命令实现增量备份。
+
+## 恢复表
+
+恢复 `monitor` 表：
+
+```sql
+COPY monitor FROM '/home/backup/monitor/monitor.parquet' WITH (FORMAT = 'parquet');
+```
+
+如果每次增量导出数据，所有文件在同一文件夹下但文件名不同，可以使用 `PATTERN` 选项选中并恢复它们：
+
+```sql
+COPY monitor FROM '/home/backup/monitor/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet');
+```
+
+## 备份和恢复数据库
+
+和表的命令类似：
+
+```sql
+-- 备份数据库 public --
+COPY DATABASE public TO '/home/backup/public/' WITH (FORMAT='parquet');
+
+-- 恢复数据库 public --
+COPY DATABASE public FROM '/home/backup/public/' WITH (FORMAT='parquet');
+```
+
+导出后，查看文件夹 `/home/backup/public/`，该命令将每个表导出为单独的文件。
diff --git a/docs/nightly/zh/user-guide/operations/region-migration.md b/docs/nightly/zh/user-guide/operations/region-migration.md
@@ -59,3 +59,19 @@ select migrate_region(region_id, from_peer_id, to_peer_id, replay_timeout);
 | `from_peer_id` | 迁移起始节点(Datanode) 的 peer id。 | **Required** | |
 | `to_peer_id` | 迁移目标节点(Datanode) 的 peer id。 | **Required** | |
 | `replay_timeout` | 迁移时回放数据的超时时间（单位：秒）。如果新 Region 未能在指定时间内回放数据，迁移将失败，旧 Region 中的数据不会丢失。 | Optional | |
+
+## 查询迁移状态
+
+`migrate_region` 函数将返回执行迁移的 Procedure Id，可以通过它查询过程状态：
+
+```sql
+select procedure_state('538b7476-9f79-4e50-aa9c-b1de90710839')
+```
+
+如果顺利完成，将输出 JSON 格式的状态：
+
+```json
+ {"status":"Done"}
+```
+
+当然，最终可以通过从 `information_schema` 中查询 `region_peers` 和 `partitions` 来确认 Region 分布是否符合预期。
diff --git a/docs/nightly/zh/user-guide/table-management.md b/docs/nightly/zh/user-guide/table-management.md
@@ -81,7 +81,7 @@ CREATE TABLE monitor (
  ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() TIME INDEX,
  cpu FLOAT64 DEFAULT 0,
  memory FLOAT64,
- PRIMARY KEY(host)) ENGINE=mito WITH(regions=1);
+ PRIMARY KEY(host)) ENGINE=mito;
 ```
 
 ```sql
@@ -219,7 +219,7 @@ Query OK, 1 row affected (0.01 sec)
 curl -X POST \
  -H 'authorization: Basic {{authorization if exists}}' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
- -d 'sql=CREATE TABLE monitor (host STRING, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP(), cpu FLOAT64 DEFAULT 0, memory FLOAT64, TIME INDEX (ts), PRIMARY KEY(host)) ENGINE=mito WITH(regions=1)' \
+ -d 'sql=CREATE TABLE monitor (host STRING, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP(), cpu FLOAT64 DEFAULT 0, memory FLOAT64, TIME INDEX (ts), PRIMARY KEY(host)) ENGINE=mito' \
 http://localhost:4000/v1/sql?db=public
 ```