Skip to content

Commit

Permalink
fix some problems and add upgrade for ccr (#1685)
Browse files Browse the repository at this point in the history
## Versions 

- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0

## Languages

- [x] Chinese
- [x] English

## Docs Checklist

- [ ] Checked by AI
- [ ] Test Cases Built

---------

Co-authored-by: Yongqiang YANG <yangyogqiang@selectdb.com>
  • Loading branch information
dataroaring and Yongqiang YANG authored Jan 2, 2025
1 parent f48fa0a commit 8193e83
Show file tree
Hide file tree
Showing 12 changed files with 342 additions and 120 deletions.
70 changes: 54 additions & 16 deletions docs/admin-manual/data-admin/ccr/manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ under the License.
### Network Requirements

- Syncer needs to be able to communicate with both upstream and downstream FE and BE.

- The downstream BE must have direct access to the IP used by the Doris BE process (as seen in `show frontends/backends`).

### Permission Requirements
Expand All @@ -45,7 +44,9 @@ When Syncer synchronizes, the user needs to provide accounts for both upstream a

### Version Requirements

Minimum version requirement: v2.0.15
- Syncer Version >= Downstream Doris Version >= Upstream Doris Version. Therefore, upgrade Syncer first, then upgrade downstream Doris, and finally upgrade upstream Doris.
- The minimum version for Doris 2.0 is 2.0.15, and the minimum version for Doris 2.1 is 2.1.6.
- Starting from Syncer version 2.1.8 and 3.0.4, Syncer no longer supports Doris 2.0.

### Configuration and Property Requirements

Expand All @@ -56,15 +57,6 @@ Minimum version requirement: v2.0.15
- `restore_reset_index_id`: If the table to be synchronized has an inverted index, it must be configured as `false` on the target cluster.
- `ignore_backup_tmp_partitions`: If the upstream creates tmp partitions, Doris will prohibit backup, causing Syncer synchronization to be interrupted; setting `ignore_backup_tmp_partitions=true` in FE can avoid this issue.

:::caution
**Starting from versions 2.1.8/3.0.4, the minimum Doris version supported by ccr syncer is 2.1, and version 2.0 will no longer be supported.**
:::

#### Versions Not Recommended for Use

Doris Versions
- 2.1.5/2.0.14: If upgraded from previous versions to these two versions and the user has drop partition operations, they may encounter NPE during upgrade or restart due to a new field introduced in this version that older versions do not have, resulting in a default value of null. This issue is fixed in versions 2.1.6/2.0.15.

## Enable binlog for all tables in the database

```shell
Expand All @@ -73,18 +65,18 @@ bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db

## Start Syncer

You can start Syncer using `bin/start_syncer.sh`.
Assuming the environment variable ${SYNCER_HOME} is set to the working directory of Syncer. You can start Syncer using `bin/start_syncer.sh`.

| **Option** | **Description** | **Command Example** | **Default Value** |
|------------|-----------------|---------------------|--------------------|
| `--daemon` | Run Syncer in the background | `bin/start_syncer.sh --daemon` | `false` |
| `--db_type` | Syncer can use two types of databases to store metadata: `sqlite3` (local storage) and `mysql` (local or remote storage). When using `mysql` to store metadata, Syncer will create a database named `ccr` using `CREATE IF NOT EXISTS`, and the metadata table will be stored there. | `bin/start_syncer.sh --db_type mysql` | `sqlite3` |
| `--db_dir` | **Effective only when using `sqlite3`**; specifies the filename and path of the SQLite3 generated database file. | `bin/start_syncer.sh --db_dir /path/to/ccr.db` | `SYNCER_OUTPUT_DIR/db/ccr.db` |
| `--db_dir` | **Effective only when using `sqlite3`**; specifies the filename and path of the SQLite3 generated database file. | `bin/start_syncer.sh --db_dir /path/to/ccr.db` | `SYNCER_HOME/db/ccr.db` |
| `--db_host`<br>`--db_port`<br>`--db_user`<br>`--db_password` | **Effective only when using `mysql`**; used to set the host, port, user, and password for MySQL. | `bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456"` | `db_host` and `db_port` default to example values; `db_user` and `db_password` default to empty. |
| `--log_dir` | Specify the log output path | `bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log` | `SYNCER_OUTPUT_DIR/log/ccr_syncer.log` |
| `--log_dir` | Specify the log output path | `bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log` | `SYNCER_HOME/log/ccr_syncer.log` |
| `--log_level` | Specify the log output level; the log format is as follows: `time level msg hooks`. The default value is `info` when running in the background; when running in the foreground, the default value is `trace`, and logs are saved to `log_dir` using `tee`. | `bin/start_syncer.sh --log_level info` | `info` (background)<br>`trace` (foreground) |
| `--host`<br>`--port` | Specify the `host` and `port` for Syncer. The `host` is used to distinguish instances of Syncer in the cluster and can be understood as the name of Syncer; the naming format for Syncer in the cluster is `host:port`. | `bin/start_syncer.sh --host 127.0.0.1 --port 9190` | `host` defaults to `127.0.0.1`<br>`port` defaults to `9190` |
| `--pid_dir` | Specify the path to save the PID file. The PID file is the credential for the `stop_syncer.sh` script to stop Syncer, saving the corresponding Syncer's process number. For ease of cluster management, you can customize the path. | `bin/start_syncer.sh --pid_dir /path/to/pids` | `SYNCER_OUTPUT_DIR/bin` |
| `--pid_dir` | Specify the path to save the PID file. The PID file is the credential for the `stop_syncer.sh` script to stop Syncer, saving the corresponding Syncer's process number. For ease of cluster management, you can customize the path. | `bin/start_syncer.sh --pid_dir /path/to/pids` | `SYNCER_HOME/bin` |

## Stop Syncer

Expand All @@ -100,7 +92,7 @@ Options for Method 3:

| **Option** | **Description** | **Command Example** | **Default Value** |
|------------|-----------------|---------------------|--------------------|
| `--pid_dir` | Specify the directory where the PID files are located; all three stopping methods depend on this option to execute. | `bash bin/stop_syncer.sh --pid_dir /path/to/pids` | `SYNCER_OUTPUT_DIR/bin` |
| `--pid_dir` | Specify the directory where the PID files are located; all three stopping methods depend on this option to execute. | `bash bin/stop_syncer.sh --pid_dir /path/to/pids` | `SYNCER_HOME/bin` |
| `--host`<br>`--port` | Stop the Syncer corresponding to `host:port` in the `pid_dir` path. If only `host` is specified, it degrades to **Method 3**; if both `host` and `port` are not empty, it will be effective as **Method 1**. | `bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190` | `host`: 127.0.0.1<br>`port`: empty |
| `--files` | Stop the Syncers corresponding to the specified PID file names in the `pid_dir` path, separated by spaces and enclosed in `"` quotes. | `bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"` | None |

Expand Down Expand Up @@ -246,6 +238,52 @@ curl http://ccr_syncer_host:ccr_syncer_port/list_jobs

Syncer high availability relies on MySQL. If MySQL is used as backend storage, Syncer can discover other Syncers; if one crashes, others will take over its jobs.

## Upgrade

### 1. Upgrade Syncer
Assuming the following environment variables are set:
- ${SYNCER_HOME}: Syncer's working directory.
- ${SYNCER_PACKAGE_DIR}: Directory containing the new Syncer.

Upgrade every Syncer by following these steps.

1.1. Save start commands

Save the output of the following command to a file.
```
ps -elf | grep ccr_syncer
```

1.2. Stop the current Syncer

```shell
sh bin/stop_syncer.sh --pid_dir ${SYNCER_HOME}/bin
```

1.3. Backup the existing MetaService binaries

```shell
mv ${SYNCER_HOME}/bin bin_backup_$(date +%Y%m%d_%H%M%S)
```

1.4. Deploy the new package

```shell
cp ${SYNCER_PACKAGE_DIR}/bin ${SYNCER_HOME}/bin
```

1.5. Start the new Syncer

Start the new Syncer using the command saved in 1.1.

### 2. Upgrade downstream Doris (If Necessary)

Upgrade the upstream system by following the instructions in the [Upgrade Doris](../../../admin-manual/cluster-management/upgrade.md) guide.

### 3. Upgrade upstream Doris (If Necessary)

Upgrade the upstream system by following the instructions in the [Upgrade Doris](../../../admin-manual/cluster-management/upgrade.md) guide.

## Usage Notes

:::caution
Expand Down
2 changes: 1 addition & 1 deletion docs/admin-manual/data-admin/ccr/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The performance data in this document is based on the default configuration. If

### Test Steps

1. Create the library and table information for TPC-H 1T in the upstream cluster.
1. Create the database and tables for TPC-H 1T in the upstream cluster.
2. Create a synchronization job for the TPC-H 1T database.
3. Wait for the TPC-H 1T data import to complete and record the completion time.
4. Wait for the downstream data synchronization to complete and record the completion time.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ under the License.
### 网络要求

- 需要 Syncer 与上下游的 FE 和 BE 是互通的

- 下游 BE 与上游 BE 通过 Doris BE 进程使用的 IP (`show frontends/backends` 看到的) 是直通的。

### 权限要求
Expand All @@ -45,7 +44,9 @@ Syncer 同步时需要用户提供上下游的账户,该账户需要拥有下

### 版本要求

版本最低要求:v2.0.15
- Syncer 版本 >= 下游 Doris 版本 >= 上游 Doris 版本。因此,首先升级 Syncer,然后升级下游 Doris,最后升级上游 Doris。
- Doris 2.0 的最低版本为 2.0.15,Doris 2.1 的最低版本为 2.1.6。
- 从 Syncer 版本 2.1.8 和 3.0.4 开始,Syncer 不再支持 Doris 2.0。

### 配置和属性要求

Expand All @@ -56,16 +57,6 @@ Syncer 同步时需要用户提供上下游的账户,该账户需要拥有下
- `restore_reset_index_id`:如果要同步的表中带有 inverted index,那么必须在目标集群上配置为 `false`
- `ignore_backup_tmp_partitions`:如果上游有创建 tmp partition,那么 doris 会禁止做 backup,因此 Syncer 同步会中断;通过在 FE 设置 `ignore_backup_tmp_partitions=true` 可以避免这个问题。

:::caution
**从 2.1.8/3.0.4 开始,ccr syncer 支持的最小 Doris 版本是 2.1,2.0 版本将不再支持。**
:::

#### 不建议使用版本

Doris 版本
- 2.1.5/2.0.14:如果从之前的版本升级到这两个版本,且用户有 drop partition 操作,那么会在升级、重启时碰到 NPE,原因是这个版本引入了一个新字段,旧版本没有所以默认值为 null。这个问题在 2.1.6/2.0.15 修复。


## 开启库中所有表的 binlog

```shell
Expand All @@ -74,18 +65,18 @@ bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db

## 启动 Syncer

可以使用 `bin/start_syncer.sh` 启动 Syncer。
假设环境变量 ${SYNCER_HOME} 被设置为 Syncer 的工作目录。可以使用 `bin/start_syncer.sh` 启动 Syncer。

| **选项** | **描述** | **命令示例** | **默认值** |
|----------|----------|--------------|------------|
| `--daemon` | 后台运行 Syncer | `bin/start_syncer.sh --daemon` | `false` |
| `--db_type` | Syncer 可使用两种数据库保存元数据:`sqlite3`(本地存储)和 `mysql`(本地或远端存储)。当使用 `mysql` 存储元数据时,Syncer 会使用 `CREATE IF NOT EXISTS` 创建名为 `ccr` 的库,元数据表保存在其中。 | `bin/start_syncer.sh --db_type mysql` | `sqlite3` |
| `--db_dir` | **仅在数据库使用 `sqlite3` 时生效**,可指定 SQLite3 生成的数据库文件名及路径。 | `bin/start_syncer.sh --db_dir /path/to/ccr.db` | `SYNCER_OUTPUT_DIR/db/ccr.db` |
| `--db_dir` | **仅在数据库使用 `sqlite3` 时生效**,可指定 SQLite3 生成的数据库文件名及路径。 | `bin/start_syncer.sh --db_dir /path/to/ccr.db` | `SYNCER_HOME/db/ccr.db` |
| `--db_host`<br>`--db_port`<br>`--db_user`<br>`--db_password` | **仅在数据库使用 `mysql` 时生效**,用于设置 MySQL 的主机、端口、用户和密码。 | `bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456"` | `db_host``db_port` 默认为示例值;`db_user``db_password` 默认为空。 |
| `--log_dir` | 指定日志输出路径 | `bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log` | `SYNCER_OUTPUT_DIR/log/ccr_syncer.log` |
| `--log_dir` | 指定日志输出路径 | `bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log` | `SYNCER_HOME/log/ccr_syncer.log` |
| `--log_level` | 指定日志输出等级,日志格式如下:`time level msg hooks`。在 `--daemon` 下默认值为 `info`;前台运行时默认值为 `trace`,并通过 `tee` 保存日志到 `log_dir`| `bin/start_syncer.sh --log_level info` | `info`(后台运行)<br>`trace`(前台运行) |
| `--host`<br>`--port` | 指定 Syncer 的 `host``port``host` 用于区分集群中 Syncer 的实例,可理解为 Syncer 的名称,集群中 Syncer 的名称格式为 `host:port`| `bin/start_syncer.sh --host 127.0.0.1 --port 9190` | `host` 默认为 `127.0.0.1`<br>`port` 默认为 `9190` |
| `--pid_dir` | 指定 PID 文件保存路径。PID 文件为 `stop_syncer.sh` 脚本停止 Syncer 的凭据,保存对应 Syncer 的进程号。为方便集群化管理,可自定义路径。 | `bin/start_syncer.sh --pid_dir /path/to/pids` | `SYNCER_OUTPUT_DIR/bin` |
| `--pid_dir` | 指定 PID 文件保存路径。PID 文件为 `stop_syncer.sh` 脚本停止 Syncer 的凭据,保存对应 Syncer 的进程号。为方便集群化管理,可自定义路径。 | `bin/start_syncer.sh --pid_dir /path/to/pids` | `SYNCER_HOME/bin` |


## 停止 Syncer
Expand All @@ -102,7 +93,7 @@ bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db

| **选项** | **描述** | **命令示例** | **默认值** |
|----------|----------|--------------|------------|
| `--pid_dir` | 指定 PID 文件所在目录,上述三种停止方法都依赖于此选项执行。 | `bash bin/stop_syncer.sh --pid_dir /path/to/pids` | `SYNCER_OUTPUT_DIR/bin` |
| `--pid_dir` | 指定 PID 文件所在目录,上述三种停止方法都依赖于此选项执行。 | `bash bin/stop_syncer.sh --pid_dir /path/to/pids` | `SYNCER_HOME/bin` |
| `--host`<br>`--port` | 停止 `pid_dir` 路径下 `host:port` 对应的 Syncer。仅指定 `host` 时退化为**方法 3**`host``port` 都不为空时生效为**方法 1**| `bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190` | `host`: 127.0.0.1<br>`port`: 空 |
| `--files` | 停止 `pid_dir` 路径下指定 PID 文件名对应的 Syncer,文件之间用空格分隔,并整体用 `"` 包裹。 | `bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"` ||

Expand Down Expand Up @@ -249,6 +240,51 @@ curl http://ccr_syncer_host:ccr_syncer_port/list_jobs

Syncer 高可用依赖 mysql,如果使用 mysql 作为后端存储,Syncer 可以发现其它 Syncer,如果一个 crash 了,其他会分担它的任务。

## Upgrade

### 1. 升级 Syncer
假设以下环境变量已设置:
- ${SYNCER_HOME}:Syncer 的工作目录。
- ${SYNCER_PACKAGE_DIR}:包含新 Syncer 的目录。

通过以下步骤升级每个 Syncer。

1.1. 保存启动命令

将以下命令的输出保存到文件中。
```
ps -elf | grep ccr_syncer
```

1.2. 停止当前 Syncer

```shell
sh bin/stop_syncer.sh --pid_dir ${SYNCER_HOME}/bin
```

1.3. 备份现有的 MetaService 二进制文件

```shell
mv ${SYNCER_HOME}/bin bin_backup_$(date +%Y%m%d_%H%M%S)
```

1.4. 部署新包

```shell
cp ${SYNCER_PACKAGE_DIR}/bin ${SYNCER_HOME}/bin
```

1.5. 启动新的 Syncer

使用在 1.1 中保存的命令启动新的 Syncer。

### 2. 升级下游 Doris(如有必要)

按照 [升级 Doris](../../../admin-manual/cluster-management/upgrade.md) 指南中的说明升级上游系统。

### 3. 升级上游 Doris(如有必要)

按照 [升级 Doris](../../../admin-manual/cluster-management/upgrade.md) 指南中的说明升级上游系统。

## 使用须知

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ under the License.
4. 等待下游数据同步完成,记录完成时间。

### 测试结论
增量同步时间差:33 秒`
增量同步时间差:33 秒

---

Expand All @@ -60,7 +60,7 @@ under the License.
3. 等待下游数据同步完成,记录完成时间。

### 测试结论
全量同步时间差:6 分 1 秒
全量同步时间差:6 分 1 秒

---

Expand Down
Loading

0 comments on commit 8193e83

Please sign in to comment.