diff --git a/docs/admin-manual/data-admin/backup.md b/docs/admin-manual/data-admin/backup.md deleted file mode 100644 index c6c5698eef01b..0000000000000 --- a/docs/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -{ - "title": "Data Backup", - "language": "en" -} ---- - - - -# Data Backup - -Doris supports backing up the current data in the form of files to the remote storage system like S3 and HDFS. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -## A brief explanation of the principle - -The backup operation is to upload the data of the specified table or partition directly to the remote warehouse for storage in the form of files stored by Doris. When a user submits a Backup request, the system will perform the following operations: - -1. Snapshot and snapshot upload - - The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are all operations on snapshots. After the snapshot, changes, imports, etc. to the table no longer affect the results of the backup. Snapshots only generate a hard link to the current data file, which takes very little time. After the snapshot is completed, the snapshot files will be uploaded one by one. Snapshot uploads are done concurrently by each Backend. - -2. Metadata preparation and upload - - After the data file snapshot upload is complete, Frontend will first write the corresponding metadata to a local file, and then upload the local metadata file to the remote warehouse through the broker. Completing the final backup job - -3. Dynamic Partition Table Description - - If the table is a dynamic partition table, the dynamic partition attribute will be automatically disabled after backup. When restoring, you need to manually enable the dynamic partition attribute of the table. The command is as follows: - -```sql -ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") -``` - -4. Backup and Restore operation will NOT keep the `colocate_with` property of a table. - -## Start Backup - -1. Create a hdfs remote warehouse example_repo (S3 skips step 1): - - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - -2. Create a remote repository for S3 : s3_repo (HDFS skips step 2) - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY" = "xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >Note that. - > - >ON LOCATION is followed by Bucket Name here - -3. Full backup of table example_tbl under example_db to warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -4. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -5. View the execution of the most recent backup job: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -6. View existing backups in remote repositories: - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -For the detailed usage of BACKUP, please refer to [here](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md). - -## Best Practices - -### Backup - -Currently, we support full backup with the smallest partition (Partition) granularity (incremental backup may be supported in future versions). If you need to back up data regularly, you first need to plan the partitioning and bucketing of the table reasonably when building the table, such as partitioning by time. Then, in the subsequent running process, regular data backups are performed according to the partition granularity. - -### Data Migration - -Users can back up the data to the remote warehouse first, and then restore the data to another cluster through the remote warehouse to complete the data migration. Because data backup is done in the form of snapshots, new imported data after the snapshot phase of the backup job will not be backed up. Therefore, after the snapshot is completed and until the recovery job is completed, the data imported on the original cluster needs to be imported again on the new cluster. - -It is recommended to import the new and old clusters in parallel for a period of time after the migration is complete. After verifying the correctness of data and services, migrate services to a new cluster. - -## Highlights - -1. Operations related to backup and recovery are currently only allowed to be performed by users with ADMIN privileges. -2. Within a database, only one backup or restore job is allowed to be executed. -3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations by partition to reduce the cost of failed retry. -4. Because of the backup and restore operations, the operations are the actual data files. Therefore, when a table has too many shards, or a shard has too many small versions, it may take a long time to backup or restore even if the total amount of data is small. Users can use `SHOW PARTITIONS FROM table_name;` and `SHOW TABLETS FROM table_name;` to view the number of shards in each partition and the number of file versions in each shard to estimate job execution time. The number of files has a great impact on the execution time of the job. Therefore, it is recommended to plan partitions and buckets reasonably when creating tables to avoid excessive sharding. -5. When checking job status via `SHOW BACKUP` or `SHOW RESTORE` command. It is possible to see error messages in the `TaskErrMsg` column. But as long as the `State` column is not `CANCELLED`, the job is still continuing. These tasks may retry successfully. Of course, some Task errors will also directly cause the job to fail. -6. If the recovery job is an overwrite operation (specifying the recovery data to an existing table or partition), then from the `COMMIT` phase of the recovery job, the overwritten data on the current cluster may no longer be restored. If the restore job fails or is canceled at this time, the previous data may be damaged and inaccessible. In this case, the only way to do it is to perform the recovery operation again and wait for the job to complete. Therefore, we recommend that if unnecessary, try not to restore data by overwriting unless it is confirmed that the current data is no longer used. - -## Related Commands - -1. The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - - 1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. Please refer to [Create Repository Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY.md). - - 2. BACKUP - - Perform a backup operation. Please refer to [Backup Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md). - - 3. SHOW BACKUP - - View the execution of the most recent backup job. Please refer to [Show Backup Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-BACKUP.md)。 - - 4. SHOW SNAPSHOT - - View existing backups in the remote repository. Please refer to [Show Snapshot Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-SNAPSHOT.md). - - 5. CANCEL BACKUP - - Cancel the currently executing backup job. Please refer to [Cancel Backup Reference] (../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CANCEL-BACKUP.md). - - 6. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. Please refer to [Drop Repository Reference] (../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/DROP-REPOSITORY.md). - -## More Help - - For more detailed syntax and best practices used by BACKUP, please refer to the [BACKUP](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md) command manual, You can also type `HELP BACKUP` on the MySql client command line for more help. diff --git a/docs/admin-manual/data-admin/ccr/config.md b/docs/admin-manual/data-admin/ccr/config.md index df37ba7353330..93d7edfa88310 100644 --- a/docs/admin-manual/data-admin/ccr/config.md +++ b/docs/admin-manual/data-admin/ccr/config.md @@ -1,6 +1,8 @@ --- -title: Configuration Instructions -language: en +{ + "title": "Configuration Instructions", + "language": "en" +} --- - -# Repair Data - -For the Unique Key Merge on Write table, there are bugs in some Doris versions, which may cause errors when the system calculates the delete bitmap, resulting in duplicate primary keys. At this time, the full compaction function can be used to repair the data. This function is invalid for non-Unique Key Merge on Write tables. - -This feature requires Doris version 2.0+. - -To use this function, it is necessary to stop the import as much as possible, otherwise problems such as import timeout may occur. - -## Brief principle explanation - -After the full compaction is executed, the delete bitmap will be recalculated, and the wrong delete bitmap data will be deleted to complete the data restoration. - -## Instructions for use - -`POST /api/compaction/run?tablet_id={int}&compact_type=full` - -or - -`POST /api/compaction/run?table_id={int}&compact_type=full` - -Note that only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. - -## Example of use - -``` -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" -curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" -``` \ No newline at end of file diff --git a/docs/admin-manual/data-admin/overview.md b/docs/admin-manual/data-admin/overview.md index 912046242c9f0..9bd6132414992 100644 --- a/docs/admin-manual/data-admin/overview.md +++ b/docs/admin-manual/data-admin/overview.md @@ -1,6 +1,6 @@ --- { - "title": "Business Continuity & Data Recovery Overview", + "title": "Disaster Recovery Management Overview", "language": "en" } --- diff --git a/docs/admin-manual/data-admin/restore.md b/docs/admin-manual/data-admin/restore.md deleted file mode 100644 index 779a8a26f83b7..0000000000000 --- a/docs/admin-manual/data-admin/restore.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -{ - "title": "Data Restore", - "language": "en" -} ---- - - - -# Data Recovery - -Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can view the currently deployed broker through `SHOW BROKER;`. - -## Brief principle description - -The restore operation needs to specify an existing backup in the remote warehouse, and then restore the content of the backup to the local cluster. When the user submits the Restore request, the system will perform the following operations: - -1. Create the corresponding metadata locally - - This step will first create and restore the corresponding table partition and other structures in the local cluster. After creation, the table is visible, but not accessible. - -2. Local snapshot - - This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the table just created has no data), and its purpose is to generate the corresponding snapshot directory on the Backend for later receiving the snapshot file downloaded from the remote warehouse. - -3. Download snapshot - - The snapshot files in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is done concurrently by each Backend. - -4. Effective snapshot - - After the snapshot download is complete, we need to map each snapshot to the metadata of the current local table. These snapshots are then reloaded to take effect, completing the final recovery job. - -## Start Restore - -1. Restore the table backup_tbl in backup snapshot_1 from example_repo to database example_db1, the time version is "2018-05-04-16-45-08". Revert to 1 copy: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. Restore partitions p1 and p2 of table backup_tbl in backup snapshot_2 from example_repo, and table backup_tbl2 to database example_db1, and rename it to new_tbl with time version "2018-05-04-17-11-01". The default reverts to 3 replicas: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. View the execution of the restore job: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -For detailed usage of RESTORE, please refer to [here](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE.md). - -## Related Commands - -The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - -1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. This command needs to use the Broker process to access the remote storage. Different brokers need to provide different parameters. For details, please refer to [Broker documentation](../../data-operate/import/broker-load-manual), or you can directly back up to support through the S3 protocol For the remote storage of AWS S3 protocol, directly back up to HDFS, please refer to [Create Remote Warehouse Documentation](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY) - -2. RESTORE - - Perform a restore operation. - -3. SHOW RESTORE - - View the execution of the most recent restore job, including: - - - JobId: The id of the current recovery job. - - Label: The name (Label) of the backup in the warehouse specified by the user. - - Timestamp: The timestamp of the backup in the user-specified repository. - - DbName: Database corresponding to the restore job. - - State: The current stage of the recovery job: - - PENDING: The initial status of the job. - - SNAPSHOTING: The snapshot operation of the newly created table is in progress. - - DOWNLOAD: Sending download snapshot task. - - DOWNLOADING: Snapshot is downloading. - - COMMIT: Prepare the downloaded snapshot to take effect. - - COMMITTING: Validating downloaded snapshots. - - FINISHED: Recovery is complete. - - CANCELLED: Recovery failed or was canceled. - - AllowLoad: Whether to allow import during restore. - - ReplicationNum: Restores the specified number of replicas. - - RestoreObjs: List of tables and partitions involved in this restore. - - CreateTime: Job creation time. - - MetaPreparedTime: Local metadata generation completion time. - - SnapshotFinishedTime: The local snapshot completion time. - - DownloadFinishedTime: The time when the remote snapshot download is completed. - - FinishedTime: The completion time of this job. - - UnfinishedTasks: During `SNAPSHOTTING`, `DOWNLOADING`, `COMMITTING` and other stages, there will be multiple subtasks going on at the same time. The current stage shown here is the task id of the unfinished subtasks. - - TaskErrMsg: If there is an error in the execution of a subtask, the error message of the corresponding subtask will be displayed here. - - Status: Used to record some status information that may appear during the entire job process. - - Timeout: The timeout period of the job, in seconds. - -4. CANCEL RESTORE - - Cancel the currently executing restore job. - -5. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. - -## Common mistakes - -1. Restore Report An Error:[20181: invalid md5 of downloaded file: /data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - If the number of copies of the table backed up and restored is inconsistent, you need to specify the number of copies when executing the restore command. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) command manual - -2. Restore Report An Error:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - Backup and restore are not caused by the same version, use the specified meta_version to read the metadata of the previous backup. Note that this parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris. The latest version of the backup data already contains the meta version, so there is no need to specify it. For the specific solution to the above error, specify meta_version = 100. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) command manual - -## More Help - -For more detailed syntax and best practices used by RESTORE, please refer to the [RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) command manual, You can also type `HELP RESTORE` on the MySql client command line for more help. diff --git a/docs/admin-manual/maint-monitor/disk-capacity.md b/docs/admin-manual/maint-monitor/disk-capacity.md index 86bfe6abc5db8..86f544b2423fc 100644 --- a/docs/admin-manual/maint-monitor/disk-capacity.md +++ b/docs/admin-manual/maint-monitor/disk-capacity.md @@ -162,6 +162,6 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o ```rm -rf data/0/12345/``` - * Delete tablet metadata refer to [Tablet metadata management tool](./tablet-meta-tool.md) + * Delete tablet metadata refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md) ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/docs/admin-manual/maint-monitor/monitor-metrics/metrics.md b/docs/admin-manual/maint-monitor/metrics.md similarity index 99% rename from docs/admin-manual/maint-monitor/monitor-metrics/metrics.md rename to docs/admin-manual/maint-monitor/metrics.md index f317c31383ced..e755528df4a36 100644 --- a/docs/admin-manual/maint-monitor/monitor-metrics/metrics.md +++ b/docs/admin-manual/maint-monitor/metrics.md @@ -26,7 +26,7 @@ under the License. # Monitor Metrics -Doris FE process and BE processes provide complete monitoring metrics. Monitoring metrics can be divided into two categories: +Doris FE process and BE processes provide complete monitoring metrics. Monitoring metrics can be divided into two categories: 1. **Process monitoring**: mainly displays some monitoring values of the Doris process itself . 2. **Node monitoring**: mainly displays the monitoring of the node machine itself where the Doris process is located, such as CPU , memory, IO , network , etc. @@ -48,7 +48,7 @@ doris_fe_cache_hit{type="sql"} 0 doris_fe_connection_total 2 ``` -Monitoring metrics in Json format can be fetched using `type` parameter in rest interface, for eg: +Monitoring metrics in Json format can be fetched using `type` parameter in rest interface, for eg: ``` curl http://fe_host:http_port/metrics?type=json diff --git a/docs/admin-manual/open-api/be-http/compaction-run.md b/docs/admin-manual/open-api/be-http/compaction-run.md index f2b3cb45f56b7..3cbf31334106a 100644 --- a/docs/admin-manual/open-api/be-http/compaction-run.md +++ b/docs/admin-manual/open-api/be-http/compaction-run.md @@ -1,6 +1,6 @@ --- { - "title": "Manually Trigger Compaction", + "title": "Disk Capacity Management", "language": "en" } --- @@ -24,110 +24,144 @@ specific language governing permissions and limitations under the License. --> -# Manually Trigger Compaction +# Disk Capacity Management -## Request +This document mainly introduces system parameters and processing strategies related to disk storage capacity. -`POST /api/compaction/run?tablet_id={int}&compact_type={enum}` -`POST /api/compaction/run?table_id={int}&compact_type=full` Note that table_id=xxx will take effect only when compact_type=full is specified. -`GET /api/compaction/run_status?tablet_id={int}` +If Doris' data disk capacity is not controlled, the process will hang because the disk is full. Therefore, we monitor the disk usage and remaining capacity, and control various operations in the Doris system by setting different warning levels, and try to avoid the situation where the disk is full. +## Glossary -## Description +* Data Dir: Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory. -Used to manually trigger the comparison and show status. +## Basic Principles -## Query parameters +BE will report disk usage to FE on a regular basis (every minute). FE records these statistical values and restricts various operation requests based on these statistical values. -* `tablet_id` - - ID of the tablet +Two thresholds, **High Watermark** and **Flood Stage**, are set in FE. Flood Stage is higher than High Watermark. When the disk usage is higher than High Watermark, Doris will restrict the execution of certain operations (such as replica balancing, etc.). If it is higher than Flood Stage, certain operations (such as load data) will be prohibited. -* `table_id` - - ID of table. Note that table_id=xxx will take effect only when compact_type=full is specified, and only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. +At the same time, a **Flood Stage** is also set on the BE. Taking into account that FE cannot fully detect the disk usage on BE in a timely manner, and cannot control certain BE operations (such as Compaction). Therefore, Flood Stage on the BE is used for the BE to actively refuse and stop certain operations to achieve the purpose of self-protection. -* `compact_type` - - The value is `base` or `cumulative` or `full`. For usage scenarios of full_compaction, please refer to [Data Recovery](../../data-admin/repairing-data.md). +## FE Parameter -## Request body +**High Watermark:** -None +``` +storage_high_watermark_usage_percent: default value is 85 (85%). +storage_min_left_capacity_bytes: default value is 2GB. +``` -## Response +When disk capacity **more than** `storage_high_watermark_usage_percent`, **or** disk free capacity **less than** `storage_min_left_capacity_bytes`, the disk will no longer be used as the destination path for the following operations: -### Trigger Compaction +* Tablet Balance +* Colocation Relocation +* Decommission -If the tablet does not exist, an error in JSON format is returned: +**Flood Stage:** ``` -{ - "status": "Fail", - "msg": "Tablet not found" -} +storage_flood_stage_usage_percent: default value is 95 (95%). +storage_flood_stage_left_capacity_bytes: default value is 1GB. ``` -If the tablet exists and the tablet is not running, JSON format is returned: +When disk capacity **more than** `storage_flood_stage_usage_percent`, **or** disk free capacity **less than** `storage_flood_stage_left_capacity_bytes`, the disk will no longer be used as the destination path for the following operations: + +* Tablet Balance +* Colocation Relocation +* Replica make up +* Restore +* Load/Insert -``` -{ - "status": "Fail", - "msg": "fail to execute compaction, error = -2000" -} -``` +## BE Parameter -If the tablet exists and the tablet is running, JSON format is returned: +**Flood Stage:** ``` -{ - "status": "Success", - "msg": "compaction task is successfully triggered." -} +capacity_used_percent_flood_stage: default value is 95 (95%). +capacity_min_left_bytes_flood_stage: default value is 1GB. ``` -Explanation of results: +When disk capacity **more than** `storage_flood_stage_usage_percent`, **and** disk free capacity **less than** `storage_flood_stage_left_capacity_bytes`, the following operations on this disk will be prohibited: -* status: Trigger task status, when it is successfully triggered, it is Success; when for some reason (for example, the appropriate version is not obtained), it returns Fail. -* msg: Give specific success or failure information. +* Base/Cumulative Compaction +* Data load +* Clone Task (Usually occurs when the replica is repaired or balanced.) +* Push Task (Occurs during the Loading phase of Hadoop import, and the file is downloaded. ) +* Alter Task (Schema Change or Rollup Task.) +* Download Task (The Downloading phase of the recovery operation.) + +## Disk Capacity Release -### Show Status +When the disk capacity is higher than High Watermark or even Flood Stage, many operations will be prohibited. At this time, you can try to reduce the disk usage and restore the system in the following ways. -If the tablet does not exist, an error in JSON format is returned: -``` -{ - "status": "Fail", - "msg": "Tablet not found" -} -``` -If the tablet exists and the tablet is not running, JSON format is returned: +* Delete table or partition -``` -{ - "status" : "Success", - "run_status" : false, - "msg" : "this tablet_id is not running", - "tablet_id" : 11308, - "schema_hash" : 700967178, - "compact_type" : "" -} -``` + By deleting tables or partitions, you can quickly reduce the disk space usage and restore the cluster. + **Note: Only the `DROP` operation can achieve the purpose of quickly reducing the disk space usage, the `DELETE` operation cannot.** -If the tablet exists and the tablet is running, JSON format is returned: -``` -{ - "status" : "Success", - "run_status" : true, - "msg" : "this tablet_id is running", - "tablet_id" : 11308, - "schema_hash" : 700967178, - "compact_type" : "cumulative" -} -``` + ``` + DROP TABLE tbl; + ALTER TABLE tbl DROP PARTITION p1; + ``` + +* BE expansion -Explanation of results: + After backend expansion, data tablets will be automatically balanced to BE nodes with lower disk usage. The expansion operation will make the cluster reach a balanced state in a few hours or days depending on the amount of data and the number of nodes. + +* Modify replica of a table or partition -* run_status: Get the current manual compaction task execution status. + You can reduce the number of replica of a table or partition. For example, the default 3 replica can be reduced to 2 replica. Although this method reduces the reliability of the data, it can quickly reduce the disk usage rate and restore the cluster to normal. + This method is usually used in emergency recovery systems. Please restore the number of copies to 3 after reducing the disk usage rate by expanding or deleting data after recovery. + Modifying the replica operation takes effect instantly, and the backends will automatically and asynchronously delete the redundant replica. + + ``` + ALTER TABLE tbl MODIFY PARTITION p1 SET("replication_num" = "2"); + ``` + +* Delete unnecessary files -### Examples + When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can start. + Files in the following directories can be deleted directly: -``` -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=cumulative" -``` \ No newline at end of file + * log/: Log files in the log directory. + * snapshot/: Snapshot files in the snapshot directory. + * trash/ Trash files in the trash directory. + + **This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).** + + If the BE can still be started, you can use `ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. + + + If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: + * If the disk usage does not reach 90% of the **Flood Stage**, expired trash files and expired snapshot files will be cleaned up. At this time, some recent files will be retained without affecting the recovery of data. + * If the disk usage has reached 90% of the **Flood Stage**, **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**. + + The time interval for automatic execution can be changed by `max_garbage_sweep_interval` and `min_garbage_sweep_interval` in the configuration items. + + When the recovery fails due to lack of trash files, the following results may be returned: + + ``` + {"status": "Fail","msg": "can find tablet path in trash"} + ``` + +* Delete data file (dangerous!!!) + + When none of the above operations can free up capacity, you need to delete data files to free up space. The data file is in the `data/` directory of the specified data directory. To delete a tablet, you must first ensure that at least one replica of the tablet is normal, otherwise **deleting the only replica will result in data loss**. + + Suppose we want to delete the tablet with id 12345: + + * Find the directory corresponding to Tablet, usually under `data/shard_id/tablet_id/`. like: + + ```data/0/12345/``` + + * Record the tablet id and schema hash. The schema hash is the name of the next-level directory of the previous step. The following is 352781111: + + ```data/0/12345/352781111``` + + * Delete the data directory: + + ```rm -rf data/0/12345/``` + + * Delete tablet metadata (refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md)) + + ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/docs/admin-manual/query-admin/kill-query.md b/docs/admin-manual/query-admin/kill-query.md deleted file mode 100644 index 0e0ec8ecddd44..0000000000000 --- a/docs/admin-manual/query-admin/kill-query.md +++ /dev/null @@ -1,86 +0,0 @@ ---- -{ - "title": "Kill Query", - "language": "en" -} ---- - - - -# Kill Query -## Kill connection - -In Doris, each connection runs in a separate thread. You can terminate a thread using the `KILL processlist_id`statement. - -The `processlist_id` for the thread can be found in the Id column from the SHOW PROCESSLIST output. Or you can use the `SELECT CONNECTION_ID()` command to query the current connection id. - -Syntax: - -```SQL -KILL [CONNECTION] processlist_id -``` - -## Kill query - -You can also terminate the query command under execution based on the processlist_id or the query_id. - -Syntax: - -```SQL -KILL QUERY processlist_id | query_id -``` - -## Example - -1. Check the current connection id. - -```SQL -mysql select connection_id(); -+-----------------+ -| connection_id() | -+-----------------+ -| 48 | -+-----------------+ -1 row in set (0.00 sec) -``` - -2. Check all connection id. - -```SQL -mysql SHOW PROCESSLIST; -+------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+ -| CurrentConnected | Id | User | Host | LoginTime | Catalog | Db | Command | Time | State | QueryId | Info | -+------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+ -| Yes | 48 | root | 10.16.xx.xx:44834 | 2023-12-29 16:49:47 | internal | test | Query | 0 | OK | e6e4ce9567b04859-8eeab8d6b5513e38 | SHOW PROCESSLIST | -| | 50 | root | 192.168.xx.xx:52837 | 2023-12-29 16:51:34 | internal | | Sleep | 1837 | EOF | deaf13c52b3b4a3b-b25e8254b50ff8cb | SELECT @@session.transaction_isolation | -| | 51 | root | 192.168.xx.xx:52843 | 2023-12-29 16:51:35 | internal | | Sleep | 907 | EOF | 437f219addc0404f-9befe7f6acf9a700 | /* ApplicationName=DBeaver Ultimate 23.1.3 - Metadata */ SHOW STATUS | -| | 55 | root | 192.168.xx.xx:55533 | 2023-12-29 17:09:32 | internal | test | Sleep | 271 | EOF | f02603dc163a4da3-beebbb5d1ced760c | /* ApplicationName=DBeaver Ultimate 23.1.3 - SQLEditor */ SELECT DATABASE() | -| | 47 | root | 10.16.xx.xx:35678 | 2023-12-29 16:21:56 | internal | test | Sleep | 3528 | EOF | f4944c543dc34a99-b0d0f3986c8f1c98 | select * from test | -+------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+ -5 rows in set (0.00 sec) -``` - -3. Kill the currently running query, which will then be displayed as canceled. - -```SQL -mysql kill query 55; -Query OK, 0 rows affected (0.01 sec) -``` - diff --git a/docs/admin-manual/query-admin/sql-interception.md b/docs/admin-manual/query-admin/sql-interception.md deleted file mode 100644 index 0b64a970dea2c..0000000000000 --- a/docs/admin-manual/query-admin/sql-interception.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -{ - "title": "SQL Interception", - "language": "en" -} ---- - - - -This feature is used to restrict the execution of SQL statements (both DDL and DML can be restricted). - -Supports per-user configuration of SQL interception rules, such as using regular expressions to match and intercept SQL, or using supported rules for interception. - -## Creating and Managing Rules - -### Creating Rules - -For more syntax on creating rules, please refer to [CREATE SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-SQL-BLOCK-RULE). - -- `sql`: Matching rule (based on regular expression matching, special characters need to be escaped), optional, default value is "NULL". -- `sqlHash`: SQL hash value for exact matching. This value will be printed in `fe.audit.log`, optional. This parameter and SQL are mutually exclusive, default value is "NULL". -- `partition_num`: Maximum number of partitions a scan node will scan, default value is 0L. -- `tablet_num`: Maximum number of tablets a scan node will scan, default value is 0L. -- `cardinality`: Rough number of rows scanned by a scan node, default value is 0L. -- `global`: Whether it is globally effective (for all users), default is false. -- `enable`: Whether to enable the blocking rule, default is true. - -Example: - -```sql -CREATE SQL_BLOCK_RULE test_rule1 -PROPERTIES( - "sql"="select \\* from order_analysis", - "global"="false", - "enable"="true", - "sqlHash"="" -); - -CREATE SQL_BLOCK_RULE test_rule2 -PROPERTIES( - "partition_num" = "30", - "cardinality"="10000000000", - "global"="false", - "enable"="true" -) -``` - -:::note -Note: Do not include a semicolon at the end of the SQL statement. -::: - -Starting from version 2.1.6, SQL interception rules support external tables (tables in the External Catalog). - -- `sql`: Same as for internal tables. -- `sqlHash`: Same as for internal tables. -- `partition_num`: Same as for internal tables. -- `tablet_num`: Limits the number of shards scanned for external tables. Different data sources have different definitions of shards. For example, file shards in Hive tables, incremental data shards in Hudi tables, etc. -- `cardinality`: Same as for internal tables, limits the number of scanned rows. This parameter only takes effect when there are row count statistics for external tables (such as collected manually or automatically). - -### Binding Rules - -Rules with `global` set to `true` are globally effective and do not need to be bound to specific users. - -Rules with `global` set to `false` need to be bound to specific users. A user can be bound to multiple rules, and multiple rules are separated by `,`. - -```sql -SET PROPERTY [FOR 'jack'] 'sql_block_rules' = 'test_rule1,test_rule2' -``` - -### Viewing Rules - -- View the configured SQL blocking rules. - -If no rule name is specified, all rules will be viewed. For specific syntax, please refer to [SHOW SQL BLOCK RULE](../../sql-manual/sql-statements/Show-Statements/SHOW-SQL-BLOCK-RULE) - -```sql -SHOW SQL_BLOCK_RULE [FOR RULE_NAME] -``` - -- View rules bound to a user - -```sql -SHOW PROPERTY FOR user_name; -``` - -### Modifying Rules - -Allow modifications to each item such as sql/sqlHash/partition_num/tablet_num/cardinality/global/enable. For specific syntax, please refer to [ALTER SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-SQL-BLOCK-RULE) - -- `sql` and `sqlHash` cannot be set simultaneously. - -If a rule sets `sql` or `sqlHash`, the other property cannot be modified. - -- `sql`/`sqlHash` and `partition_num`/`tablet_num`/`cardinality` cannot be set simultaneously - -For example, if a rule sets `partition_num`, then `sql` or `sqlHash` cannot be modified. - -```sql -ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","enable"="true") -``` - -```sql -ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true") -``` - -### Deleting Rules - -Support deleting multiple rules simultaneously, separated by `,`. For specific syntax, please refer to [DROP SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Drop/DROP-SQL-BLOCK-RULE) - -``` -DROP SQL_BLOCK_RULE test_rule1,test_rule2 -``` - -## Triggering Rules - -When we execute the SQL defined in the rules, an exception error will be returned, as shown below: - -```sql -mysql> select * from order_analysis; -ERROR 1064 (HY000): errCode = 2, detailMessage = sql match regex sql block rule: order_analysis_rule -``` diff --git a/docs/admin-manual/compaction.md b/docs/admin-manual/trouble-shooting/compaction.md similarity index 100% rename from docs/admin-manual/compaction.md rename to docs/admin-manual/trouble-shooting/compaction.md diff --git a/docs/admin-manual/maint-monitor/frontend-lock-manager.md b/docs/admin-manual/trouble-shooting/frontend-lock-manager.md similarity index 100% rename from docs/admin-manual/maint-monitor/frontend-lock-manager.md rename to docs/admin-manual/trouble-shooting/frontend-lock-manager.md diff --git a/docs/admin-manual/memory-management/memory-analysis/doris-cache-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/doris-cache-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/global-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/global-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/global-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/global-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/heap-profile-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/heap-profile-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/heap-profile-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/heap-profile-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/jemalloc-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/jemalloc-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/jemalloc-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/jemalloc-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/load-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/load-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/load-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/load-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/memory-log-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/memory-log-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/memory-log-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/memory-log-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/metadata-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/metadata-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/metadata-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/metadata-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/oom-crash-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/oom-crash-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/oom-crash-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/oom-crash-analysis.md diff --git a/docs/admin-manual/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md diff --git a/docs/admin-manual/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md diff --git a/docs/admin-manual/memory-management/memory-analysis/query-memory-analysis.md b/docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-memory-analysis.md similarity index 100% rename from docs/admin-manual/memory-management/memory-analysis/query-memory-analysis.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-analysis/query-memory-analysis.md diff --git a/docs/admin-manual/memory-management/memory-feature/memory-control-strategy.md b/docs/admin-manual/trouble-shooting/memory-management/memory-feature/memory-control-strategy.md similarity index 100% rename from docs/admin-manual/memory-management/memory-feature/memory-control-strategy.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-feature/memory-control-strategy.md diff --git a/docs/admin-manual/memory-management/memory-feature/memory-tracker.md b/docs/admin-manual/trouble-shooting/memory-management/memory-feature/memory-tracker.md similarity index 100% rename from docs/admin-manual/memory-management/memory-feature/memory-tracker.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-feature/memory-tracker.md diff --git a/docs/admin-manual/memory-management/memory-issue-faq.md b/docs/admin-manual/trouble-shooting/memory-management/memory-issue-faq.md similarity index 100% rename from docs/admin-manual/memory-management/memory-issue-faq.md rename to docs/admin-manual/trouble-shooting/memory-management/memory-issue-faq.md diff --git a/docs/admin-manual/memory-management/overview.md b/docs/admin-manual/trouble-shooting/memory-management/overview.md similarity index 100% rename from docs/admin-manual/memory-management/overview.md rename to docs/admin-manual/trouble-shooting/memory-management/overview.md diff --git a/docs/admin-manual/maint-monitor/metadata-operation.md b/docs/admin-manual/trouble-shooting/metadata-operation.md similarity index 100% rename from docs/admin-manual/maint-monitor/metadata-operation.md rename to docs/admin-manual/trouble-shooting/metadata-operation.md diff --git a/docs/admin-manual/data-admin/repairing-data.md b/docs/admin-manual/trouble-shooting/repairing-data.md similarity index 100% rename from docs/admin-manual/data-admin/repairing-data.md rename to docs/admin-manual/trouble-shooting/repairing-data.md diff --git a/docs/admin-manual/maint-monitor/tablet-local-debug.md b/docs/admin-manual/trouble-shooting/tablet-local-debug.md similarity index 100% rename from docs/admin-manual/maint-monitor/tablet-local-debug.md rename to docs/admin-manual/trouble-shooting/tablet-local-debug.md diff --git a/docs/admin-manual/maint-monitor/tablet-meta-tool.md b/docs/admin-manual/trouble-shooting/tablet-meta-tool.md similarity index 100% rename from docs/admin-manual/maint-monitor/tablet-meta-tool.md rename to docs/admin-manual/trouble-shooting/tablet-meta-tool.md diff --git a/docs/faq/install-faq.md b/docs/faq/install-faq.md index a9e0b69c6e5e9..456135b811fb7 100644 --- a/docs/faq/install-faq.md +++ b/docs/faq/install-faq.md @@ -253,7 +253,7 @@ There are usually two reasons for this problem: 1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because `priority_network` is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying `priority_network`. 2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services. -If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document. +If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/trouble-shooting/metadata-operation.md) in the Doris official website document. ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -263,7 +263,7 @@ If the following problems occur when using MySQL client to connect to Doris, thi Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two values in the error differ by 2. Causes FE to fail to start. -This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/maint-monitor/metadata-operation.md). +This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/trouble-shooting/metadata-operation.md). ### Q12. Doris compile and install JDK version incompatibility problem diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json index ed2f988370b27..b6fd8045b7af1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json @@ -247,9 +247,9 @@ "message": "集群管理", "description": "The label for category Managing Cluster in sidebar docs" }, - "sidebar.docs.category.Managing Data": { - "message": "业务连续性与数据恢复", - "description": "The label for category Business continuity & data recovery in sidebar docs" + "sidebar.docs.category.Managing Disater Recovery": { + "message": "容灾管理", + "description": "The label for category Managing Disater Recovery in sidebar docs" }, "sidebar.docs.category.Managing Workload": { "message": "负载管理", @@ -263,14 +263,14 @@ "message": "资源隔离", "description": "The label for category Resource Isolation in sidebar docs" }, - "sidebar.docs.category.Managing Query": { - "message": "查询管理", - "description": "The label for category Managing Query in sidebar docs" - }, "sidebar.docs.category.Managing User Privilege": { "message": "安全管理", "description": "The label for category Managing User Privilege in sidebar docs" }, + "sidebar.docs.category.Trouble Shooting": { + "message": "故障诊断处理", + "description": "The label for category Trouble Shooting in sidebar docs" + }, "sidebar.docs.category.Managing Memory": { "message": "内存管理", "description": "The label for category Managing Memory in sidebar docs" diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/time-zone.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/time-zone.md index e464bcf6d3420..85a5edf65808d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/time-zone.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/time-zone.md @@ -1,6 +1,6 @@ --- { - "title": "时区", + "title": "时区管理", "language": "zh-CN" } --- @@ -209,7 +209,7 @@ Doris 目前兼容各时区下的数据向 Doris 中进行导入。而由于 Dor ### 信息更新 -真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的IANA 数据保持一致,请采取下列方式进行更新: +真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的 IANA 数据保持一致,请采取下列方式进行更新: 1. 使用包管理器更新 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup.md deleted file mode 100644 index fb3152178b8db..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,196 +0,0 @@ ---- -{ - "title": "数据备份", - "language": "zh-CN" -} ---- - - - -# 数据备份 - -Doris 支持将当前数据以文件的形式备份到 HDFS 和对象存储。之后可以通过恢复命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期地进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移,集群间无损迁移可以使用 CCR (ccr.md)。 - -该功能需要 Doris 版本 0.8.2+ - -## 原理说明 - -备份操作是将指定表或分区的数据,直接以 Doris 存储的文件的形式,上传到远端仓库中进行存储。当用户提交 Backup 请求后,系统内部会做如下操作: - -1. 快照及快照上传 - - 快照阶段会对指定的表或分区数据文件进行快照。之后,备份都是对快照进行操作。在快照之后,对表进行的更改、导入等操作都不再影响备份的结果。快照只是对当前数据文件产生一个硬链,耗时很少。快照完成后,会开始对这些快照文件进行逐一上传。快照上传由各个 Backend 并发完成。 - -2. 元数据准备及上传 - - 数据文件快照上传完成后,Frontend 会首先将对应元数据写成本地文件,然后通过 broker 将本地元数据文件上传到远端仓库。完成最终备份作业 - -3. 动态分区表说明 - - 如果该表是动态分区表,备份之后会自动禁用动态分区属性,在做恢复的时候需要手动将该表的动态分区属性启用,命令如下: - - ```sql - ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") - ``` - -4. 备份和恢复操作都不会保留表的 `colocate_with` 属性。 - -## 开始备份 - -1. 创建一个 HDFS 的远程仓库 example_repo(S3 存储请参考 2): - - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - -2. 创建一个 s3 的远程仓库 : s3_repo(HDFS 存储请参考 1) - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY"="xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >注意: - > - >ON LOCATION 这里后面跟的是 Bucket Name - -2. 全量备份 example_db 下的表 example_tbl 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -3. 全量备份 example_db 下,表 example_tbl 的 p1, p2 分区,以及表 example_tbl2 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -4. 查看最近 backup 作业的执行情况: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -5. 查看远端仓库中已存在的备份 - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -BACKUP 的更多用法可参考 [这里](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md)。 - -## 最佳实践 - -### 备份 - -当前我们支持最小分区(Partition)粒度的全量备份(增量备份有可能在未来版本支持)。如果需要对数据进行定期备份,首先需要在建表时,合理的规划表的分区及分桶,比如按时间进行分区。然后在之后的运行过程中,按照分区粒度进行定期的数据备份。 - -### 数据迁移 - -用户可以先将数据备份到远端仓库,再通过远端仓库将数据恢复到另一个集群,完成数据迁移。因为数据备份是通过快照的形式完成的,所以,在备份作业的快照阶段之后的新的导入数据,是不会备份的。因此,在快照完成后,到恢复作业完成这期间,在原集群上导入的数据,都需要在新集群上同样导入一遍。 - -建议在迁移完成后,对新旧两个集群并行导入一段时间。完成数据和业务正确性校验后,再将业务迁移到新的集群。 - -## 说明 - -1. 备份恢复相关的操作目前只允许拥有 ADMIN 权限的用户执行。 -2. 一个 Database 内,只允许有一个正在执行的备份或恢复作业。 -3. 备份和恢复都支持最小分区(Partition)级别的操作,当表的数据量很大时,建议按分区分别执行,以降低失败重试的代价。 -4. 因为备份恢复操作,操作的都是实际的数据文件。所以当一个表的分片过多,或者一个分片有过多的小版本时,可能即使总数据量很小,依然需要备份或恢复很长时间。用户可以通过 `SHOW PARTITIONS FROM table_name;` 和 `SHOW TABLETS FROM table_name;` 来查看各个分区的分片数量,以及各个分片的文件版本数量,来预估作业执行时间。文件数量对作业执行的时间影响非常大,所以建议在建表时,合理规划分区分桶,以避免过多的分片。 -5. 当通过 `SHOW BACKUP` 或者 `SHOW RESTORE` 命令查看作业状态时。有可能会在 `TaskErrMsg` 一列中看到错误信息。但只要 `State` 列不为 `CANCELLED`,则说明作业依然在继续。这些 Task 有可能会重试成功。当然,有些 Task 错误,也会直接导致作业失败。 - 常见的`TaskErrMsg`错误如下: - Q1:备份到 HDFS,状态显示 UPLOADING,TaskErrMsg 错误信息:[13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0,port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception: ] - 这个一般是网络通信问题,查看broker日志,看某个ip 或者端口不通,如果是云服务,则需要查看是否访问了内网,如果是,则可以在borker/conf文件夹下添加hdfs-site.xml,还需在hdfs-site.xml配置文件下添加dfs.client.use.datanode.hostname=true,并在broker节点上配置HADOOP集群的主机名映射。 -6. 如果恢复作业是一次覆盖操作(指定恢复数据到已经存在的表或分区中),那么从恢复作业的 `COMMIT` 阶段开始,当前集群上被覆盖的数据有可能不能再被还原。此时如果恢复作业失败或被取消,有可能造成之前的数据已损坏且无法访问。这种情况下,只能通过再次执行恢复操作,并等待作业完成。因此,我们建议,如无必要,尽量不要使用覆盖的方式恢复数据,除非确认当前数据已不再使用。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -1. CREATE REPOSITORY - - 创建一个远端仓库路径,用于备份或恢复。具体参考 [创建远程仓库文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY.md)。 - -2. BACKUP - - 执行一次备份操作。具体参考 [备份文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md)。 - -3. SHOW BACKUP - - 查看最近一次 backup 作业的执行情况。具体参考 [查看备份作业文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-BACKUP.md)。 - -4. SHOW SNAPSHOT - - 查看远端仓库中已存在的备份。具体参考 [查看备份文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-SNAPSHOT.md)。 - -5. CANCEL BACKUP - - 取消当前正在执行的备份作业。具体参考 [取消备份作业文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CANCEL-BACKUP.md)。 - -6. DROP REPOSITORY - - 删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。具体参考 [删除远程仓库文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/DROP-REPOSITORY.md)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/config.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/config.md index 8b10bdb3999a8..34b4f7a78659f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/config.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/config.md @@ -1,6 +1,8 @@ --- -title: 配置说明 -language: zh-CN +{ + "title": "配置说明", + "language": "zh-CN" +} --- - - - -Doris 支持将当前数据以文件的形式,通过 broker 备份到远端存储系统中。之后可以通过 恢复 命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期的进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移。 - -该功能需要 Doris 版本 0.8.2+ - -使用该功能,需要部署对应远端存储的 broker。如 BOS、HDFS 等。可以通过 `SHOW BROKER;` 查看当前部署的 broker。 - -## 简要原理说明 - -恢复操作需要指定一个远端仓库中已存在的备份,然后将这个备份的内容恢复到本地集群中。当用户提交 Restore 请求后,系统内部会做如下操作: - -1. 在本地创建对应的元数据 - - 这一步首先会在本地集群中,创建恢复对应的表分区等结构。创建完成后,该表可见,但是不可访问。 - -2. 本地 snapshot - - 这一步是将上一步创建的表做一个快照。这其实是一个空快照(因为刚创建的表是没有数据的),其目的主要是在 Backend 上产生对应的快照目录,用于之后接收从远端仓库下载的快照文件。 - -3. 下载快照 - - 远端仓库中的快照文件,会被下载到对应的上一步生成的快照目录中。这一步由各个 Backend 并发完成。 - -4. 生效快照 - - 快照下载完成后,我们要将各个快照映射为当前本地表的元数据。然后重新加载这些快照,使之生效,完成最终的恢复作业。 - -## 开始恢复 - -1. 从 example_repo 中恢复备份 snapshot_1 中的表 backup_tbl 到数据库 example_db1,时间版本为 "2018-05-04-16-45-08"。恢复为 1 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. 从 example_repo 中恢复备份 snapshot_2 中的表 backup_tbl 的分区 p1,p2,以及表 backup_tbl2 到数据库 example_db1,并重命名为 new_tbl,时间版本为 "2018-05-04-17-11-01"。默认恢复为 3 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. 查看 restore 作业的执行情况: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -RESTORE 的更多用法可参考 [这里](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE.md)。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -**1. CREATE REPOSITORY** - -创建一个远端仓库路径,用于备份或恢复。该命令需要借助 Broker 进程访问远端存储,不同的 Broker 需要提供不同的参数,具体请参阅 [Broker 文档](../../data-operate/import/broker-load-manual),也可以直接通过 S3 协议备份到支持 AWS S3 协议的远程存储上去,也可以直接备份到 HDFS,具体参考 [创建远程仓库文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY) - -**2. RESTORE** - -执行一次恢复操作。 - -3. SHOW RESTORE - -查看最近一次 restore 作业的执行情况,包括: - -- JobId:本次恢复作业的 id。 - -- Label:用户指定的仓库中备份的名称(Label)。 - -- Timestamp:用户指定的仓库中备份的时间戳。 - -- DbName:恢复作业对应的 Database。 - -- State:恢复作业当前所在阶段: - - - PENDING:作业初始状态。 - - - SNAPSHOTING:正在进行本地新建表的快照操作。 - - - DOWNLOAD:正在发送下载快照任务。 - - - DOWNLOADING:快照正在下载。 - - - COMMIT:准备生效已下载的快照。 - - - COMMITTING:正在生效已下载的快照。 - - - FINISHED:恢复完成。 - - - CANCELLED:恢复失败或被取消。 - -- AllowLoad:恢复期间是否允许导入。 - -- ReplicationNum:恢复指定的副本数。 - -- RestoreObjs:本次恢复涉及的表和分区的清单。 - -- CreateTime:作业创建时间。 - -- MetaPreparedTime:本地元数据生成完成时间。 - -- SnapshotFinishedTime:本地快照完成时间。 - -- DownloadFinishedTime:远端快照下载完成时间。 - -- FinishedTime:本次作业完成时间。 - -- UnfinishedTasks:在 `SNAPSHOTTING`,`DOWNLOADING`, `COMMITTING` 等阶段,会有多个子任务在同时 -进行,这里展示的当前阶段,未完成的子任务的 task id。 - -- TaskErrMsg:如果有子任务执行出错,这里会显示对应子任务的错误信息。 - -- Status:用于记录在整个作业过程中,可能出现的一些状态信息。 - -- Timeout:作业的超时时间,单位是秒。 - -**4. CANCEL RESTORE** - -取消当前正在执行的恢复作业。 - -**5. DROP REPOSITORY** - -删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。 - -## 常见错误 - -1. RESTORE 报错:[20181: invalid md5 of downloaded file:/data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - 备份和恢复的表的副本数不一致导致的,执行恢复命令时需指定副本个数,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) 命令手册 - -2. RESTORE 报错:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - 备份和恢复不是同一个版本导致的,使用指定的 meta_version 来读取之前备份的元数据。注意,该参数作为临时方案,仅用于恢复老版本 Doris 备份的数据。最新版本的备份数据中已经包含 meta version,无需再指定,针对上述错误具体解决方案指定 meta_version = 100,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) 命令手册 - -## 更多帮助 - -关于 RESTORE 使用的更多详细语法及最佳实践,请参阅 [RESTORE](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/RESTORE) 命令手册,你也可以在 MySql 客户端命令行下输入 `HELP RESTORE` 获取更多帮助信息。 - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/disk-capacity.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/disk-capacity.md index c5e7982ebf6d4..973707dc785dc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/disk-capacity.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/disk-capacity.md @@ -170,6 +170,6 @@ storage_flood_stage_left_capacity_bytes 默认 1GB。 `rm -rf data/0/12345/` - - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](./tablet-meta-tool.md)) + - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](../trouble-shooting/tablet-meta-tool.md)) `./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/monitor-metrics/metrics.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/metrics.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/monitor-metrics/metrics.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/metrics.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/be-http/compaction-run.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/be-http/compaction-run.md index fd68c058b5c72..78fcfefd7fd3d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/be-http/compaction-run.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/be-http/compaction-run.md @@ -49,7 +49,7 @@ under the License. * `compact_type` - - 取值为`base`或`cumulative`或`full`。full_compaction 的使用场景请参考[数据恢复](../../data-admin/repairing-data)。 + - 取值为`base`或`cumulative`或`full`。full_compaction 的使用场景请参考[数据恢复](../../trouble-shooting/repairing-data)。 ## Request body diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/query-admin/sql-interception.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/query-admin/sql-interception.md deleted file mode 100644 index 05dbf6d4ad3e6..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/query-admin/sql-interception.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -{ - "title": "SQL 拦截", - "language": "zh-CN" -} ---- - - - -该功能用于限制执行 SQL 语句(DDL / DML 都可限制)。 - -支持按用户配置 SQL 的拦截规则,如使用正则表达式匹配和拦截 SQL,或使用支持的规则进行拦截。 - -## 创建和管理规则 - -### 创建规则 - -更多创建语法请参阅[CREATE SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-SQL-BLOCK-RULE) - -- `sql`:匹配规则 (基于正则匹配,特殊字符需要转译),可选,默认值为 "NULL" -- `sqlHash`: sql hash 值,用于完全匹配,我们会在`fe.audit.log`打印这个值,可选,这个参数和 SQL 只能二选一,默认值为 "NULL" -- `partition_num`: 一个扫描节点会扫描的最大 Partition 数量,默认值为 0L -- `tablet_num`: 一个扫描节点会扫描的最大 Tablet 数量,默认值为 0L。 -- `cardinality`: 一个扫描节点粗略的扫描行数,默认值为 0L -- `global`:是否全局 (所有用户) 生效,默认为 false -- `enable`:是否开启阻止规则,默认为 true - -示例: - -```sql -CREATE SQL_BLOCK_RULE test_rule1 -PROPERTIES( - "sql"="select \\* from order_analysis", - "global"="false", - "enable"="true", - "sqlHash"="" -); - -CREATE SQL_BLOCK_RULE test_rule2 -PROPERTIES( - "partition_num" = "30", - "cardinality"="10000000000", - "global"="false", - "enable"="true" -) -``` - -:::note -注意:这里 SQL 语句最后不要带分号 -::: - -从 2.1.6 版本开始,SQL 拦截规则支持外部表(External Catalog 中的表)。 - -- `sql`:和内表含义一致。 -- `sqlHash`: 和内表含义一致。 -- `partition_num`:和内表含义一致。 -- `tablet_num`:限制外表的扫描的分片数量。不同的数据源,分片的定义不尽相同。比如 Hive 表中的文件分片,Hudi 表中的增量数据分片等。 -- `cardinality`:和内表含义一致,限制扫描行数。只有当外表存在行数统计信息时(如通过手动或自动统计信息采集后),该参数才会生效。 - -### 绑定规则 - -`global` 为 `true` 的规则是全局生效的,不需要绑定到具体用户。 - -`global` 为 `false` 的规则,需要绑定到指定用户。一个用户可以绑定多个规则,多个规则使用 `,` 分隔: - -```sql -SET PROPERTY [FOR 'jack'] 'sql_block_rules' = 'test_rule1,test_rule2' -``` - -### 查看规则 - -- 查看已配置的 SQL 阻止规则 - - 不指定规则名则为查看所有规则,具体语法请参阅 [SHOW SQL BLOCK RULE](../../sql-manual/sql-statements/Show-Statements/SHOW-SQL-BLOCK-RULE) - - ```sql - SHOW SQL_BLOCK_RULE [FOR RULE_NAME] - ``` - -- 查看用户绑定的规则 - - ```sql - SHOW PROPERTY FOR user_name; - ``` - -### 修改规则 - -允许对 sql/sqlHash/partition_num/tablet_num/cardinality/global/enable 等每一项进行修改,具体语法请参阅[ALTER SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-SQL-BLOCK-RULE) - -- `sql` 和 `sqlHash` 不能同时被设置。 - - 如果一个 rule 设置了 `sql` 或者 `sqlHash`,则另一个属性将无法被修改。 - -- `sql`/`sqlHash` 和 `partition_num`/`tablet_num`/`cardinality` 不能同时被设置 - - 举例,如果一个 rule 设置了 `partition_num`,那么 `sql` 或者 `sqlHash` 将无法被修改。 - -```sql -ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","enable"="true") -``` - -```sql -ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true") -``` - -### 删除规则 - -支持同时删除多个规则,以 `,` 隔开,具体语法请参阅 [DROP SQL BLOCK RULE](../../sql-manual/sql-statements/Data-Definition-Statements/Drop/DROP-SQL-BLOCK-RULE) - -``` -DROP SQL_BLOCK_RULE test_rule1,test_rule2 -``` - -## 触发规则 - -当我们去执行刚才我们定义在规则里的 SQL 时就会返回异常错误,示例如下: - -```sql -mysql> select * from order_analysis; -ERROR 1064 (HY000): errCode = 2, detailMessage = sql match regex sql block rule: order_analysis_rule -``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/compaction.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/compaction.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/compaction.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/compaction.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/frontend-lock-manager.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/frontend-lock-manager.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/frontend-lock-manager.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/frontend-lock-manager.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/doris-cache-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/doris-cache-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/global-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/global-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/global-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/global-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/heap-profile-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/heap-profile-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/heap-profile-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/heap-profile-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/jemalloc-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/jemalloc-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/jemalloc-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/jemalloc-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/load-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/load-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/load-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/load-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/memory-log-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/memory-log-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/memory-log-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/memory-log-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/metadata-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/metadata-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/metadata-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/metadata-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/oom-crash-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/oom-crash-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/oom-crash-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/oom-crash-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-memory-analysis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-memory-analysis.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-analysis/query-memory-analysis.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-analysis/query-memory-analysis.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-feature/memory-control-strategy.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-feature/memory-control-strategy.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-feature/memory-control-strategy.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-feature/memory-control-strategy.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-feature/memory-tracker.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-feature/memory-tracker.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-feature/memory-tracker.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-feature/memory-tracker.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-issue-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-issue-faq.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/memory-issue-faq.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/memory-issue-faq.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/memory-management/overview.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/memory-management/overview.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/metadata-operation.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/metadata-operation.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/metadata-operation.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/metadata-operation.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/repairing-data.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/repairing-data.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/repairing-data.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/repairing-data.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/tablet-local-debug.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/tablet-local-debug.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/tablet-local-debug.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/tablet-local-debug.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/tablet-meta-tool.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/tablet-meta-tool.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/maint-monitor/tablet-meta-tool.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/trouble-shooting/tablet-meta-tool.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/job-scheduler.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/job-scheduler.md index 4b5479e179064..b3fd9847c237f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/job-scheduler.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/job-scheduler.md @@ -1,6 +1,6 @@ --- { -"title": "作业调度", +"title": "调度管理", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/kill-query.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/kill-query.md index 773d6173605a9..6a10f9ebb5f93 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/kill-query.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/kill-query.md @@ -1,6 +1,6 @@ --- { -"title": "Kill Query", +"title": "终止查询", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/workload-management-summary.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/workload-management-summary.md index 97dacb28cf810..14eb866886725 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/workload-management-summary.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/workload-management/workload-management-summary.md @@ -1,6 +1,6 @@ --- { -"title": "概述", +"title": "负载管理概述", "language": "zh-CN" } --- @@ -24,21 +24,21 @@ specific language governing permissions and limitations under the License. --> -负载管理是Doris一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: +负载管理是 Doris 一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: -- 资源隔离: 通过划分多个Group,并且为每个Group都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(例如读写操作)之间互不干扰; +- 资源隔离:通过划分多个 Group,并且为每个 Group 都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(例如读写操作)之间互不干扰; -- 并发控制与排队: 可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; +- 并发控制与排队:可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; -- 熔断: 在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 +- 熔断:在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 ## 资源划分方式 -Doris 可以通过以下3种方式将资源分组: +Doris 可以通过以下 3 种方式将资源分组: - Resource Group: 以 BE 节点为最小粒度,通过设置标签(tag)的方式,划分出多个资源组; -- Workload Group: 将一个BE内的资源(CPU、Memory、IO)通过Cgroup划分出多个资源组,实现更细致的资源分配; +- Workload Group: 将一个 BE 内的资源(CPU、Memory、IO)通过 Cgroup 划分出多个资源组,实现更细致的资源分配; - Compute Group: 是存算分离模式下的一种资源组划分的方式,与 Resource Group 类似,它也是以 BE 节点为最小粒度,划分出多个资源组。 @@ -46,9 +46,9 @@ Doris 可以通过以下3种方式将资源分组: | 资源隔离方式 | 隔离粒度 | 软/硬限制 | 跨资源组查询 | | ---------- | ----------- |-----|-----| -| Resource Group | 服务器节点级别,资源完全隔离;可以隔离BE故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | -| Workload Group | BE 进程内隔离;不能隔离BE故障 | 支持硬限制与软限制 | 支持跨资源组查询 | -|Compute Group | 服务器节点级别,资源完全隔离;可以隔离BE故障 | 硬限制 | 不支持跨资源组查询 | +| Resource Group | 服务器节点级别,资源完全隔离;可以隔离 BE 故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | +| Workload Group | BE 进程内隔离;不能隔离 BE 故障 | 支持硬限制与软限制 | 支持跨资源组查询 | +|Compute Group | 服务器节点级别,资源完全隔离;可以隔离 BE 故障 | 硬限制 | 不支持跨资源组查询 | ## 软限与硬限 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/install-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/install-faq.md index b03ebfba2c95b..9e538de642dd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/install-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/install-faq.md @@ -267,7 +267,7 @@ http { 2. 集群内多数 Follower FE 节点未启动。比如有 3 个 Follower,只启动了一个。此时需要将另外至少一个 FE 也启动,FE 可选举组方能选举出 Master 已提供服务。 -如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md)进行恢复。 +如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md)进行恢复。 ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -277,7 +277,7 @@ http { 有时重启 FE,会出现如上错误(通常只会出现在多 Follower 的情况下)。并且错误中的两个数值相差 2。导致 FE 启动失败。 -这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 +这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 ### Q12. Doris 编译安装 JDK 版本不兼容问题 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/releasenotes/v2.1/release-2.1.6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/releasenotes/v2.1/release-2.1.6.md index 65853079ee177..be8b1f1dfd7e3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/releasenotes/v2.1/release-2.1.6.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/releasenotes/v2.1/release-2.1.6.md @@ -111,7 +111,7 @@ under the License. - 更多信息,请查看文档 [table_properties](../../admin-manual/system-tables/information_schema/table_properties/) - 新增 FE 中死锁和慢锁检测功能。 - - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/maint-monitor/frontend-lock-manager/) + - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/trouble-shooting/frontend-lock-manager) ## 改进提升 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md index 5125294cb084e..2ca7bdbd8d5cd 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md @@ -156,6 +156,6 @@ capacity_min_left_bytes_flood_stage 默认 1GB。 `rm -rf data/0/12345/` - - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](./tablet-meta-tool.md)) + - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](../trouble-shooting/tablet-meta-tool.md)) `./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/faq/install-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/faq/install-faq.md index b03ebfba2c95b..9e538de642dd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/faq/install-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/faq/install-faq.md @@ -267,7 +267,7 @@ http { 2. 集群内多数 Follower FE 节点未启动。比如有 3 个 Follower,只启动了一个。此时需要将另外至少一个 FE 也启动,FE 可选举组方能选举出 Master 已提供服务。 -如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md)进行恢复。 +如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md)进行恢复。 ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -277,7 +277,7 @@ http { 有时重启 FE,会出现如上错误(通常只会出现在多 Follower 的情况下)。并且错误中的两个数值相差 2。导致 FE 启动失败。 -这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 +这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 ### Q12. Doris 编译安装 JDK 版本不兼容问题 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/releasenotes/v2.1/release-2.1.6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/releasenotes/v2.1/release-2.1.6.md index 65853079ee177..be8b1f1dfd7e3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/releasenotes/v2.1/release-2.1.6.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/releasenotes/v2.1/release-2.1.6.md @@ -111,7 +111,7 @@ under the License. - 更多信息,请查看文档 [table_properties](../../admin-manual/system-tables/information_schema/table_properties/) - 新增 FE 中死锁和慢锁检测功能。 - - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/maint-monitor/frontend-lock-manager/) + - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/trouble-shooting/frontend-lock-manager) ## 改进提升 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/time-zone.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/time-zone.md index 593e24956932a..796df4f112b5d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/time-zone.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/time-zone.md @@ -1,6 +1,6 @@ --- { - "title": "时区", + "title": "时区管理", "language": "zh-CN" } --- @@ -209,7 +209,7 @@ Doris 目前兼容各时区下的数据向 Doris 中进行导入。而由于 Dor ### 信息更新 -真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的IANA 数据保持一致,请采取下列方式进行更新: +真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的 IANA 数据保持一致,请采取下列方式进行更新: 1. 使用包管理器更新 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md index c5e7982ebf6d4..973707dc785dc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md @@ -170,6 +170,6 @@ storage_flood_stage_left_capacity_bytes 默认 1GB。 `rm -rf data/0/12345/` - - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](./tablet-meta-tool.md)) + - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](../trouble-shooting/tablet-meta-tool.md)) `./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/faq/install-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/faq/install-faq.md index b03ebfba2c95b..9e538de642dd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/faq/install-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/faq/install-faq.md @@ -267,7 +267,7 @@ http { 2. 集群内多数 Follower FE 节点未启动。比如有 3 个 Follower,只启动了一个。此时需要将另外至少一个 FE 也启动,FE 可选举组方能选举出 Master 已提供服务。 -如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md)进行恢复。 +如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md)进行恢复。 ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -277,7 +277,7 @@ http { 有时重启 FE,会出现如上错误(通常只会出现在多 Follower 的情况下)。并且错误中的两个数值相差 2。导致 FE 启动失败。 -这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 +这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 ### Q12. Doris 编译安装 JDK 版本不兼容问题 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/releasenotes/v2.1/release-2.1.6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/releasenotes/v2.1/release-2.1.6.md index 65853079ee177..be8b1f1dfd7e3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/releasenotes/v2.1/release-2.1.6.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/releasenotes/v2.1/release-2.1.6.md @@ -111,7 +111,7 @@ under the License. - 更多信息,请查看文档 [table_properties](../../admin-manual/system-tables/information_schema/table_properties/) - 新增 FE 中死锁和慢锁检测功能。 - - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/maint-monitor/frontend-lock-manager/) + - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/trouble-shooting/frontend-lock-manager) ## 改进提升 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/cluster-management/time-zone.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/cluster-management/time-zone.md index ab5ec7fe41e1e..99eee7a6c22ee 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/cluster-management/time-zone.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/cluster-management/time-zone.md @@ -1,6 +1,6 @@ --- { - "title": "时区", + "title": "时区管理", "language": "zh-CN" } --- @@ -209,7 +209,7 @@ Doris 目前兼容各时区下的数据向 Doris 中进行导入。而由于 Dor ### 信息更新 -真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的IANA 数据保持一致,请采取下列方式进行更新: +真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的 IANA 数据保持一致,请采取下列方式进行更新: 1. 使用包管理器更新 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup.md deleted file mode 100644 index dcd443c38362b..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,196 +0,0 @@ ---- -{ - "title": "数据备份", - "language": "zh-CN" -} ---- - - - -# 数据备份 - -Doris 支持将当前数据以文件的形式备份到 HDFS 和对象存储。之后可以通过恢复命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期地进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移,集群间无损迁移可以使用 CCR (../ccr.md)。 - -该功能需要 Doris 版本 0.8.2+ - -## 原理说明 - -备份操作是将指定表或分区的数据,直接以 Doris 存储的文件的形式,上传到远端仓库中进行存储。当用户提交 Backup 请求后,系统内部会做如下操作: - -1. 快照及快照上传 - - 快照阶段会对指定的表或分区数据文件进行快照。之后,备份都是对快照进行操作。在快照之后,对表进行的更改、导入等操作都不再影响备份的结果。快照只是对当前数据文件产生一个硬链,耗时很少。快照完成后,会开始对这些快照文件进行逐一上传。快照上传由各个 Backend 并发完成。 - -2. 元数据准备及上传 - - 数据文件快照上传完成后,Frontend 会首先将对应元数据写成本地文件,然后通过 broker 将本地元数据文件上传到远端仓库。完成最终备份作业 - -3. 动态分区表说明 - - 如果该表是动态分区表,备份之后会自动禁用动态分区属性,在做恢复的时候需要手动将该表的动态分区属性启用,命令如下: - - ```sql - ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") - ``` - -4. 备份和恢复操作都不会保留表的 `colocate_with` 属性。 - -## 开始备份 - -1. 创建一个 HDFS 的远程仓库 example_repo(S3 存储请参考 2): - - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - -2. 创建一个 S3 的远程仓库 : s3_repo(HDFS 存储请参考 1) - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY"="xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >注意: - > - >ON LOCATION 这里后面跟的是 Bucket Name - -2. 全量备份 example_db 下的表 example_tbl 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -3. 全量备份 example_db 下,表 example_tbl 的 p1, p2 分区,以及表 example_tbl2 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -4. 查看最近 backup 作业的执行情况: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -5. 查看远端仓库中已存在的备份 - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -BACKUP 的更多用法可参考 [这里](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md)。 - -## 最佳实践 - -### 备份 - -当前我们支持最小分区(Partition)粒度的全量备份(增量备份有可能在未来版本支持)。如果需要对数据进行定期备份,首先需要在建表时,合理的规划表的分区及分桶,比如按时间进行分区。然后在之后的运行过程中,按照分区粒度进行定期的数据备份。 - -### 数据迁移 - -用户可以先将数据备份到远端仓库,再通过远端仓库将数据恢复到另一个集群,完成数据迁移。因为数据备份是通过快照的形式完成的,所以,在备份作业的快照阶段之后的新的导入数据,是不会备份的。因此,在快照完成后,到恢复作业完成这期间,在原集群上导入的数据,都需要在新集群上同样导入一遍。 - -建议在迁移完成后,对新旧两个集群并行导入一段时间。完成数据和业务正确性校验后,再将业务迁移到新的集群。 - -## 说明 - -1. 备份恢复相关的操作目前只允许拥有 ADMIN 权限的用户执行。 -2. 一个 Database 内,只允许有一个正在执行的备份或恢复作业。 -3. 备份和恢复都支持最小分区(Partition)级别的操作,当表的数据量很大时,建议按分区分别执行,以降低失败重试的代价。 -4. 因为备份恢复操作,操作的都是实际的数据文件。所以当一个表的分片过多,或者一个分片有过多的小版本时,可能即使总数据量很小,依然需要备份或恢复很长时间。用户可以通过 `SHOW PARTITIONS FROM table_name;` 和 `SHOW TABLETS FROM table_name;` 来查看各个分区的分片数量,以及各个分片的文件版本数量,来预估作业执行时间。文件数量对作业执行的时间影响非常大,所以建议在建表时,合理规划分区分桶,以避免过多的分片。 -5. 当通过 `SHOW BACKUP` 或者 `SHOW RESTORE` 命令查看作业状态时。有可能会在 `TaskErrMsg` 一列中看到错误信息。但只要 `State` 列不为 `CANCELLED`,则说明作业依然在继续。这些 Task 有可能会重试成功。当然,有些 Task 错误,也会直接导致作业失败。 - 常见的`TaskErrMsg`错误如下: - Q1:备份到 HDFS,状态显示 UPLOADING,TaskErrMsg 错误信息:[13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0,port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception: ] - 这个一般是网络通信问题,查看broker日志,看某个ip 或者端口不通,如果是云服务,则需要查看是否访问了内网,如果是,则可以在borker/conf文件夹下添加hdfs-site.xml,还需在hdfs-site.xml配置文件下添加dfs.client.use.datanode.hostname=true,并在broker节点上配置HADOOP集群的主机名映射。 -6. 如果恢复作业是一次覆盖操作(指定恢复数据到已经存在的表或分区中),那么从恢复作业的 `COMMIT` 阶段开始,当前集群上被覆盖的数据有可能不能再被还原。此时如果恢复作业失败或被取消,有可能造成之前的数据已损坏且无法访问。这种情况下,只能通过再次执行恢复操作,并等待作业完成。因此,我们建议,如无必要,尽量不要使用覆盖的方式恢复数据,除非确认当前数据已不再使用。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -1. CREATE REPOSITORY - - 创建一个远端仓库路径,用于备份或恢复。具体参考 [创建远程仓库文档](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md)。 - -2. BACKUP - - 执行一次备份操作。具体参考 [备份文档](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md)。 - -3. SHOW BACKUP - - 查看最近一次 backup 作业的执行情况。具体参考 [查看备份作业文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-BACKUP.md)。 - -4. SHOW SNAPSHOT - - 查看远端仓库中已存在的备份。具体参考 [查看备份文档](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-SNAPSHOT.md)。 - -5. CANCEL BACKUP - - 取消当前正在执行的备份作业。具体参考 [取消备份作业文档](../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md)。 - -6. DROP REPOSITORY - - 删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。具体参考 [删除远程仓库文档](../../sql-manual/sql-statements/data-modification/backup-and-restore/DROP-REPOSITORY.md)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/ccr.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/ccr.md deleted file mode 100644 index d9b78ee2b1d9f..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/ccr.md +++ /dev/null @@ -1,642 +0,0 @@ ---- -{ - "title": "跨集群数据同步", - "language": "zh-CN" -} ---- - - - -## 概览 - -CCR(Cross Cluster Replication) 是跨集群数据同步,能够在库/表级别将源集群的数据变更同步到目标集群,可用于在线服务的数据可用性、隔离在离线负载、建设两地三中心。 - -CCR 通常被用于容灾备份、读写分离、集团与公司间数据传输和隔离升级等场景。 - -- 容灾备份:通常是将企业的数据备份到另一个集群与机房中,当突发事件导致业务中断或丢失时,可以从备份中恢复数据或快速进行主备切换。一般在对 SLA 要求比较高的场景中,都需要进行容灾备份,比如在金融、医疗、电子商务等领域中比较常见。 - -- 读写分离:读写分离是将数据的查询操作和写入操作进行分离,目的是降低读写操作的相互影响并提升资源的利用率。比如在数据库写入压力过大或在高并发场景中,采用读写分离可以将读/写操作分散到多个地域的只读/只写的数据库案例上,减少读写间的互相影响,有效保证数据库的性能及稳定性。 - -- 集团与分公司间数据传输:集团总部为了对集团内数据进行统一管控和分析,通常需要分布在各地域的分公司及时将数据传输同步到集团总部,避免因为数据不一致而引起的管理混乱和决策错误,有利于提高集团的管理效率和决策质量。 - -- 隔离升级:当在对系统集群升级时,有可能因为某些原因需要进行版本回滚,传统的升级模式往往会因为元数据不兼容的原因无法回滚。而使用 CCR 可以解决该问题,先构建一个备用的集群进行升级并双跑验证,用户可以依次升级各个集群,同时 CCR 也不依赖特定版本,使版本的回滚变得可行。 - -## 原理 - -### 名词解释 - -源集群:源头集群,业务数据写入的集群,需要 2.0 版本 - -目标集群:跨集群同步的目标集群,需要 2.0 版本 - -binlog:源集群的变更日志,包括 schema 和数据变更 - -syncer:一个轻量级的进程 - -### 架构说明 - - -![ccr 架构说明](/images/ccr-architecture-description.png) - -CCR 工具主要依赖一个轻量级进程:Syncers。Syncers 会从源集群获取 binlog,直接将元数据应用于目标集群,通知目标集群从源集群拉取数据。从而实现全量和增量迁移。 - -## 使用 - -使用非常简单,只需把 Syncers 服务启动,给他发一个命令,剩下的交给 Syncers 完成就行。 - -**1. 部署源 Doris 集群** - -**2. 部署目标 Doris 集群** - -**3. 首先源集群和目标集群都需要打开 binlog,在源集群和目标集群的 fe.conf 和 be.conf 中配置如下信息:** - -```sql -enable_feature_binlog=true -``` - -**4. 部署 syncers** - -1. 构建 CCR syncer - - ```shell - git clone https://github.com/selectdb/ccr-syncer - - cd ccr-syncer - - bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR> - - cd SYNCER_OUTPUT_DIR# 联系相关同学免费获取 ccr 二进制包 - ``` - -2. 启动和停止 syncer - - ```shell - # 启动 - cd bin && sh start_syncer.sh --daemon - - # 停止 - sh stop_syncer.sh - ``` - -**5. 打开源集群中同步库/表的 Binlog** - -```shell --- 如果是整库同步,可以执行如下脚本,使得该库下面所有的表都要打开 binlog.enable -vim shell/enable_db_binlog.sh -修改源集群的 host、port、user、password、db -或者 ./enable_db_binlog.sh --host $host --port $port --user $user --password $password --db $db - --- 如果是单表同步,则只需要打开 table 的 binlog.enable,在源集群上执行: -ALTER TABLE enable_binlog SET ("binlog.enable" = "true"); -``` - -**6. 向 syncer 发起同步任务** - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - } -}' http://127.0.0.1:9190/create_ccr -``` - -同步任务的参数说明: - -```shell -name: CCR同步任务的名称,唯一即可 -host、port:对应集群 Master FE的host和mysql(jdbc) 的端口 -user、password:syncer以何种身份去开启事务、拉取数据等 -database、table: -如果是db级别的同步,则填入your_db_name,your_table_name为空 -如果是表级别同步,则需要填入your_db_name,your_table_name -向syncer发起同步任务中的name只能使用一次 -``` - -## Syncer 详细操作手册 - -### 启动 Syncer 说明 - -根据配置选项启动 Syncer,并且在默认或指定路径下保存一个 pid 文件,pid 文件的命名方式为`host_port.pid`。 - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```sql -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` - -:::caution -**后文中的 start_syncer.sh 指的是该路径下的 start_syncer.sh!!!** -::: - -**启动选项** - -1. --daemon - -后台运行 Syncer,默认为 false - -```sql -bash bin/start_syncer.sh --daemon -``` - -2. --db_type - -Syncer 目前能够使用两种数据库来保存自身的元数据,分别为`sqlite3`(对应本地存储)和`mysql`(本地或远端存储) - -```sql -bash bin/start_syncer.sh --db_type mysql -``` - -默认值为 sqlite3 - -在使用 mysql 存储元数据时,Syncer 会使用`CREATE IF NOT EXISTS`来创建一个名为`ccr`的库,ccr 相关的元数据表都会保存在其中 - -3. --db_dir - -**这个选项仅在 db 使用****`sqlite3`****时生效** - -可以通过此选项来指定 sqlite3 生成的 db 文件名及路径。 - -```sql -bash bin/start_syncer.sh --db_dir /path/to/ccr.db -``` - -默认路径为`SYNCER_OUTPUT_DIR/db`,文件名为`ccr.db` - -4. --db_host & db_port & db_user & db_password - -**这个选项仅在 db 使用****`mysql`****时生效** - -```sql -bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456" -``` - -db_host、db_port 的默认值如例子中所示,db_user、db_password 默认值为空 - -5. --log_dir - -日志的输出路径 - -```sql -bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log -``` - -默认路径为`SYNCER_OUTPUT_DIR/log`,文件名为`ccr_syncer.log` - -6. --log_level - -用于指定 Syncer 日志的输出等级。 - -```sql -bash bin/start_syncer.sh --log_level info -``` - -日志的格式如下,其中 hook 只会在`log_level > info`的时候打印: - -```sql -# time level msg hooks -[2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] INFO This is info type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] WARN This is warn type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx -``` - -在--daemon 下,log_level 默认值为`info` - -在前台运行时,log_level 默认值为`trace`,同时日志会通过 tee 来保存到 log_dir - -6. --host && --port - -用于指定 Syncer 的 host 和 port,其中 host 只起到在集群中的区分自身的作用,可以理解为 Syncer 的 name,集群中 Syncer 的名称为`host:port` - -```sql -bash bin/start_syncer.sh --host 127.0.0.1 --port 9190 -``` - -host 默认值为 127.0.0.1,port 的默认值为 9190 - -7. --pid_dir - -用于指定 pid 文件的保存路径 - -pid 文件是 stop_syncer.sh 脚本用于关闭 Syncer 的凭据,里面保存了对应 Syncer 的进程号,为了方便 Syncer 的集群化管理,可以指定 pid 文件的保存路径 - -```sql -bash bin/start_syncer.sh --pid_dir /path/to/pids -``` - -默认值为`SYNCER_OUTPUT_DIR/bin` - -### Syncer 停止说明 - -根据默认或指定路径下 pid 文件中的进程号关闭对应 Syncer,pid 文件的命名方式为`host_port.pid`。 - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` -:::caution -**后文中的 stop_syncer.sh 指的是该路径下的 stop_syncer.sh!!!** -::: - -**停止选项** - -有三种关闭方法: - -1. 关闭目录下单个 Syncer - -​ 指定要关闭 Syncer 的 host && port,注意要与 start_syncer 时指定的 host 一致 - -2. 批量关闭目录下指定 Syncer - -​ 指定要关闭的 pid 文件名,以空格分隔,用`" "`包裹 - -3. 关闭目录下所有 Syncer - -​ 默认即可 - -1. --pid_dir - -指定 pid 文件所在目录,上述三种关闭方法都依赖于 pid 文件的所在目录执行 - -```shell -bash bin/stop_syncer.sh --pid_dir /path/to/pids -``` - -例子中的执行效果就是关闭`/path/to/pids`下所有 pid 文件对应的 Syncers(**方法 3**),`--pid_dir`可与上面三种关闭方法组合使用。 - -默认值为`SYNCER_OUTPUT_DIR/bin` - -2. --host && --port - -关闭 pid_dir 路径下 host:port 对应的 Syncer - -```shell -bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190 -``` - -host 的默认值为 127.0.0.1,port 默认值为空 - -即,单独指定 host 时**方法 1**不生效,会退化为**方法 3**。 - -host 与 port 都不为空时**方法 1**才能生效 - -3. --files - -关闭 pid_dir 路径下指定 pid 文件名对应的 Syncer - -```shell -bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid" -``` - -文件之间用空格分隔,整体需要用`" "`包裹住 - -### Syncer 操作列表 - -**请求的通用模板** - -```shell -curl -X POST -H "Content-Type: application/json" -d {json_body} http://ccr_syncer_host:ccr_syncer_port/operator -``` - -json_body: 以 json 的格式发送操作所需信息 - -operator:对应 Syncer 的不同操作 - -所以接口返回都是 json, 如果成功则是其中 success 字段为 true, 类似,错误的时候,是 false,然后存在`ErrMsgs`字段 - -```JSON -{"success":true} - -or - -{"success":false,"error_msg":"job ccr_test not exist"} -``` - -### operators - -- create_ccr - -​ 创建 CCR 任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "demo", - "table": "example_tbl" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "ccrt", - "table": "copy" - } - }' http://127.0.0.1:9190/create_ccr - ``` - -- name: CCR 同步任务的名称,唯一即可 - -- host、port:对应集群 master 的 host 和 mysql(jdbc) 的端口 - -- thrift_port:对应 FE 的 rpc_port - -- user、password:syncer 以何种身份去开启事务、拉取数据等 - -- database、table: - - - 如果是 db 级别的同步,则填入 dbName,tableName 为空 - - - 如果是表级别同步,则需要填入 dbName、tableName - -- get_lag - -​ 查看同步进度 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/get_lag - ``` - -​ 其中 job_name 是 create_ccr 时创建的 name - -- pause - -​ 暂停同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/pause - ``` - -- resume - -​ 恢复同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/resume - ``` - -- delete - -​ 删除同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/delete - ``` - -- version - - 获取版本信息 - - ```shell - curl http://ccr_syncer_host:ccr_syncer_port/version - - # > return - {"version": "2.0.1"} - ``` - -- job status - - 查看 job 的状态 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/job_status - - { - "success": true, - "status": { - "name": "ccr_db_table_alias", - "state": "running", - "progress_state": "TableIncrementalSync" - } - } - ``` - -- desync job - - 不做 sync,此时用户可以将源和目的集群互换 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/desync - ``` - -- list_jobs - - 展示已经创建的所有任务 - - ```shell - curl http://ccr_syncer_host:ccr_syncer_port/list_jobs - - {"success":true,"jobs":["ccr_db_table_alias"]} - ``` - -### 开启库中所有表的 binlog - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` -:::caution -**后文中的 enable_db_binlog.sh 指的是该路径下的 enable_db_binlog.sh!!!** -::: - -**使用说明** - -```shell -bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db -``` - -## Syncer 高可用 - -Syncer 高可用依赖 mysql,如果使用 mysql 作为后端存储,Syncer 可以发现其它 syncer,如果一个 crash 了,其他会分担他的任务 - -### 权限要求 - -1. Select_priv 对数据库、表的只读权限。 - -2. Load_priv 对数据库、表的写权限。包括 Load、Insert、Delete 等。 - -3. Alter_priv 对数据库、表的更改权限。包括重命名 库/表、添加/删除/变更 列、添加/删除 分区等操作。 - -4. Create_priv 创建数据库、表、视图的权限。 - -5. Drop_priv 删除数据库、表、视图的权限。 - -加上 Admin 权限 (之后考虑彻底移除), 这个是用来检测 enable binlog config 的,现在需要 admin - -## 使用限制 - -### 网络约束 - -- 需要 Syncer 与上下游的 FE 和 BE 都是通的 - -- 下游 BE 与上游 BE 是通的 - -- 对外 IP 和 Doris 内部 IP 是一样的,也就是说`show frontends/backends`看到的,和能直接连的 IP 是一致的,要是直连,不能是 IP 转发或者 nat - -### ThriftPool 限制 - -开大 thrift thread pool 大小,最好是超过一次 commit 的 bucket 数目大小 - -### 版本要求 - -版本最低要求:v2.0.3 - -### 不支持的操作 - -- rename table 支持有点问题 - -- 不支持一些 trash 的操作,比如 table 的 drop-recovery 操作 - -- 和 rename table 有关的,replace partition 与 - -- 不能发生在同一个 db 上同时 backup/restore - -## Feature - -### 限速 - -BE 端配置参数: - -```shell -download_binlog_rate_limit_kbs=1024 # 限制单个 BE 节点从源集群拉取 Binlog(包括 Local Snapshot)的速度为 1 MB/s -``` - -详细参数加说明: - -1. `download_binlog_rate_limit_kbs` 参数在源集群 BE 节点配置,通过设置该参数能够有效限制数据拉取速度。 - -2. `download_binlog_rate_limit_kbs` 参数主要用于设置单个 BE 节点的速度,若计算集群整体速率一般需要参数值乘以集群个数。 -## IS_BEING_SYNCED 属性 - -从 Doris v2.0 "is_being_synced" = "true" - -CCR 功能在建立同步时,会在目标集群中创建源集群同步范围中表(后称源表,位于源集群)的副本表(后称目标表,位于目标集群),但是在创建副本表时需要失效或者擦除一些功能和属性以保证同步过程中的正确性。 - -如: - -- 源表中包含了可能没有被同步到目标集群的信息,如`storage_policy`等,可能会导致目标表创建失败或者行为异常。 - -- 源表中可能包含一些动态功能,如动态分区等,可能导致目标表的行为不受 syncer 控制导致 partition 不一致。 - -在被复制时因失效而需要擦除的属性有: - -- `storage_policy` - -- `colocate_with` - -在被同步时需要失效的功能有: - -- 自动分桶 - -- 动态分区 - -### 实现 - -在创建目标表时,这条属性将会由 syncer 控制添加或者删除,在 CCR 功能中,创建一个目标表有两个途径: - -1. 在表同步时,syncer 通过 backup/restore 的方式对源表进行全量复制来得到目标表。 - -2. 在库同步时,对于存量表而言,syncer 同样通过 backup/restore 的方式来得到目标表,对于增量表而言,syncer 会通过携带有 CreateTableRecord 的 binlog 来创建目标表。 - -综上,对于插入`is_being_synced`属性有两个切入点:全量同步中的 restore 过程和增量同步时的 getDdlStmt。 - -在全量同步的 restore 过程中,syncer 会通过 rpc 发起对原集群中 snapshot 的 restore,在这个过程中为会为 RestoreStmt 添加`is_being_synced`属性,并在最终的 restoreJob 中生效,执行`isBeingSynced`的相关逻辑。在增量同步时的 getDdlStmt 中,为 getDdlStmt 方法添加参数`boolean getDdlForSync`,以区分是否为受控转化为目标表 ddl 的操作,并在创建目标表时执行`isBeingSynced`的相关逻辑。 - -对于失效属性的擦除无需多言,对于上述功能的失效需要进行说明: - -- 自动分桶 自动分桶会在创建表时生效,计算当前合适的 bucket 数量,这就可能导致源表和目的表的 bucket 数目不一致。因此在同步时需要获得源表的 bucket 数目,并且也要获得源表是否为自动分桶表的信息以便结束同步后恢复功能。当前的做法是在获取 distribution 信息时默认 autobucket 为 false,在恢复表时通过检查`_auto_bucket`属性来判断源表是否为自动分桶表,如是则将目标表的 autobucket 字段设置为 true,以此来达到跳过计算 bucket 数量,直接应用源表 bucket 数量的目的。 - -- 动态分区 动态分区则是通过将`olapTable.isBeingSynced()`添加到是否执行 add/drop partition 的判断中来实现的,这样目标表在被同步的过程中就不会周期性的执行 add/drop partition 操作。 - -### 注意 - -在未出现异常时,`is_being_synced`属性应该完全由 syncer 控制开启或关闭,用户不要自行修改该属性。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/data-recovery.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/data-recovery.md deleted file mode 100644 index e885a4e2111c6..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/data-recovery.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -{ - "title": "主键表数据修复", - "language": "zh-CN" -} ---- - - - -# 数据恢复 - -对于 Unique Key Merge on Write 表,在某些 Doris 的版本中存在 bug,可能会导致系统在计算 delete bitmap 时出现错误,导致出现重复主键,此时可以利用 full compaction 功能进行数据的修复。本功能对于非 Unique Key Merge on Write 表无效。 - -该功能需要 Doris 版本 2.0+。 - -使用该功能,需要尽可能停止导入,否则可能会出现导入超时等问题。 - -## 简要原理说明 - -执行 full compaction 后,会对 delete bitmap 进行重新计算,将错误的 delete bitmap 数据删除,以完成数据的修复。 - -## 使用说明 - -`POST /api/compaction/run?tablet_id={int}&compact_type=full` - -或 - -`POST /api/compaction/run?table_id={int}&compact_type=full` - -注意,tablet_id 和 table_id 只能指定一个,不能够同时指定,指定 table_id 后会自动对此 table 下所有 tablet 执行 full_compaction。 - -## 使用例子 - -``` -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" -curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" -``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/overview.md index f4dea57a1ceeb..d4ff7097bff63 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/overview.md @@ -1,6 +1,6 @@ --- { - "title": "业务连续性和数据恢复概览", + "title": "容灾管理概览", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/restore.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/restore.md deleted file mode 100644 index c8ad18300080c..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/restore.md +++ /dev/null @@ -1,219 +0,0 @@ ---- -{ - "title": "数据备份恢复", - "language": "zh-CN" -} ---- - - - - - -Doris 支持将当前数据以文件的形式,通过 broker 备份到远端存储系统中。之后可以通过 恢复 命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期的进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移。 - -该功能需要 Doris 版本 0.8.2+ - -使用该功能,需要部署对应远端存储的 broker。如 BOS、HDFS 等。可以通过 `SHOW BROKER;` 查看当前部署的 broker。 - -## 简要原理说明 - -恢复操作需要指定一个远端仓库中已存在的备份,然后将这个备份的内容恢复到本地集群中。当用户提交 Restore 请求后,系统内部会做如下操作: - -1. 在本地创建对应的元数据 - - 这一步首先会在本地集群中,创建恢复对应的表分区等结构。创建完成后,该表可见,但是不可访问。 - -2. 本地 snapshot - - 这一步是将上一步创建的表做一个快照。这其实是一个空快照(因为刚创建的表是没有数据的),其目的主要是在 Backend 上产生对应的快照目录,用于之后接收从远端仓库下载的快照文件。 - -3. 下载快照 - - 远端仓库中的快照文件,会被下载到对应的上一步生成的快照目录中。这一步由各个 Backend 并发完成。 - -4. 生效快照 - - 快照下载完成后,我们要将各个快照映射为当前本地表的元数据。然后重新加载这些快照,使之生效,完成最终的恢复作业。 - -## 开始恢复 - -1. 从 example_repo 中恢复备份 snapshot_1 中的表 backup_tbl 到数据库 example_db1,时间版本为 "2018-05-04-16-45-08"。恢复为 1 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. 从 example_repo 中恢复备份 snapshot_2 中的表 backup_tbl 的分区 p1,p2,以及表 backup_tbl2 到数据库 example_db1,并重命名为 new_tbl,时间版本为 "2018-05-04-17-11-01"。默认恢复为 3 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. 查看 restore 作业的执行情况: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -RESTORE 的更多用法可参考 [这里](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE.md)。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -**1. CREATE REPOSITORY** - -创建一个远端仓库路径,用于备份或恢复。该命令需要借助 Broker 进程访问远端存储,不同的 Broker 需要提供不同的参数,具体请参阅 [Broker 文档](../../data-operate/import/broker-load-manual),也可以直接通过 S3 协议备份到支持 AWS S3 协议的远程存储上去,也可以直接备份到 HDFS,具体参考 [创建远程仓库文档](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY) - -**2. RESTORE** - -执行一次恢复操作。 - -3. SHOW RESTORE - -查看最近一次 restore 作业的执行情况,包括: - -- JobId:本次恢复作业的 id。 - -- Label:用户指定的仓库中备份的名称(Label)。 - -- Timestamp:用户指定的仓库中备份的时间戳。 - -- DbName:恢复作业对应的 Database。 - -- State:恢复作业当前所在阶段: - - - PENDING:作业初始状态。 - - - SNAPSHOTING:正在进行本地新建表的快照操作。 - - - DOWNLOAD:正在发送下载快照任务。 - - - DOWNLOADING:快照正在下载。 - - - COMMIT:准备生效已下载的快照。 - - - COMMITTING:正在生效已下载的快照。 - - - FINISHED:恢复完成。 - - - CANCELLED:恢复失败或被取消。 - -- AllowLoad:恢复期间是否允许导入。 - -- ReplicationNum:恢复指定的副本数。 - -- RestoreObjs:本次恢复涉及的表和分区的清单。 - -- CreateTime:作业创建时间。 - -- MetaPreparedTime:本地元数据生成完成时间。 - -- SnapshotFinishedTime:本地快照完成时间。 - -- DownloadFinishedTime:远端快照下载完成时间。 - -- FinishedTime:本次作业完成时间。 - -- UnfinishedTasks:在 `SNAPSHOTTING`,`DOWNLOADING`, `COMMITTING` 等阶段,会有多个子任务在同时 -进行,这里展示的当前阶段,未完成的子任务的 task id。 - -- TaskErrMsg:如果有子任务执行出错,这里会显示对应子任务的错误信息。 - -- Status:用于记录在整个作业过程中,可能出现的一些状态信息。 - -- Timeout:作业的超时时间,单位是秒。 - -**4. CANCEL RESTORE** - -取消当前正在执行的恢复作业。 - -**5. DROP REPOSITORY** - -删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。 - -## 常见错误 - -1. RESTORE 报错:[20181: invalid md5 of downloaded file:/data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - 备份和恢复的表的副本数不一致导致的,执行恢复命令时需指定副本个数,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册 - -2. RESTORE 报错:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - 备份和恢复不是同一个版本导致的,使用指定的 meta_version 来读取之前备份的元数据。注意,该参数作为临时方案,仅用于恢复老版本 Doris 备份的数据。最新版本的备份数据中已经包含 meta version,无需再指定,针对上述错误具体解决方案指定 meta_version = 100,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册 - -## 更多帮助 - -关于 RESTORE 使用的更多详细语法及最佳实践,请参阅 [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册,你也可以在 MySql 客户端命令行下输入 `HELP RESTORE` 获取更多帮助信息。 - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md index c5e7982ebf6d4..973707dc785dc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md @@ -170,6 +170,6 @@ storage_flood_stage_left_capacity_bytes 默认 1GB。 `rm -rf data/0/12345/` - - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](./tablet-meta-tool.md)) + - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](../trouble-shooting/tablet-meta-tool.md)) `./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/job-scheduler.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/job-scheduler.md index 4b5479e179064..b3fd9847c237f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/job-scheduler.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/job-scheduler.md @@ -1,6 +1,6 @@ --- { -"title": "作业调度", +"title": "调度管理", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/kill-query.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/kill-query.md index 773d6173605a9..6a10f9ebb5f93 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/kill-query.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/kill-query.md @@ -1,6 +1,6 @@ --- { -"title": "Kill Query", +"title": "终止查询", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/resource-group.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/resource-group.md index b3190ee42b4df..5be32c5c209cc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/resource-group.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/resource-group.md @@ -28,25 +28,25 @@ Resource Group 是存算一体架构下,实现不同的负载之间物理隔 ![Resource Group](/images/resource_group.png) -- 通过Tag的方式,把BE 划分为不同的组,每个组通过tag的名字来标识,比如上图中把host1,host2,host3 都设置为group a, 把host4,host5 都设置为group b; -- 将表的不同的副本放到不同的分组中,比如上图中table1 有3个副本,都位于group a 中, table2 有4个副本,其中2个位于group a中,2个副本位于group b 中; -- 在查询时,根据不同的用户,使用不同的Resource Group,比如online 用户,只能访问host1,host2,host3 上的数据,所以他可以访问table1和table2;但是offline 用户只能访问host4,host5,所以只能访问table2的数据,由于table1 在group b 上没有对应的副本,所以访问会出错。 +- 通过 Tag 的方式,把 BE 划分为不同的组,每个组通过 tag 的名字来标识,比如上图中把 host1,host2,host3 都设置为 group a, 把 host4,host5 都设置为 group b; +- 将表的不同的副本放到不同的分组中,比如上图中 table1 有 3 个副本,都位于 group a 中,table2 有 4 个副本,其中 2 个位于 group a 中,2 个副本位于 group b 中; +- 在查询时,根据不同的用户,使用不同的 Resource Group,比如 online 用户,只能访问 host1,host2,host3 上的数据,所以他可以访问 table1 和 table2;但是 offline 用户只能访问 host4,host5,所以只能访问 table2 的数据,由于 table1 在 group b 上没有对应的副本,所以访问会出错。 -Resource Group本质上是一种Table副本的放置策略,所以它有以下优势和限制: -- 不同的Resource Group 使用的是不同的BE,所以它们之间完全无干扰,即使一个group 内的某个BE 宕机了,也不会影响其他Group的查询;由于导入需要多副本成功,所以如果剩下的副本数量不满足Quoram,那么导入还是会失败; -- 每个Resource Group 至少要有一个Table的一个副本,比如如果要建立5个Resource Group,并且每个Resource Group 都可能访问所有的Table,那么就需要Table 有5个副本,会带来比较大的存储开销。 +Resource Group 本质上是一种 Table 副本的放置策略,所以它有以下优势和限制: +- 不同的 Resource Group 使用的是不同的 BE,所以它们之间完全无干扰,即使一个 group 内的某个 BE 宕机了,也不会影响其他 Group 的查询;由于导入需要多副本成功,所以如果剩下的副本数量不满足 Quoram,那么导入还是会失败; +- 每个 Resource Group 至少要有一个 Table 的一个副本,比如如果要建立 5 个 Resource Group,并且每个 Resource Group 都可能访问所有的 Table,那么就需要 Table 有 5 个副本,会带来比较大的存储开销。 ## 典型使用场景 -- 读写隔离, 可以将一个集群划分为两个Resource Group,Offline Resource Group 用来执行ETL 作业,Online Resource Group 负责在线查询;数据以 3 副本的方式存储,其中 2 个副本存放在 Online 资源组,1 个副本存放在 Offline 资源组。Online 资源组主要用于高并发低延迟的在线数据服务,而一些大查询或离线 ETL 操作,则可以使用 Offline 资源组中的节点执行。从而实现在统一集群内同时提供在线和离线服务的能力。 -- 不同业务之间隔离,此时多个业务之间数据没有共享,可以为每个业务划分一个Resource Group,多个业务之间没有任何干扰,这实际上是把多个物理集群合并到统一的一个大集群管理; +- 读写隔离,可以将一个集群划分为两个 Resource Group,Offline Resource Group 用来执行 ETL 作业,Online Resource Group 负责在线查询;数据以 3 副本的方式存储,其中 2 个副本存放在 Online 资源组,1 个副本存放在 Offline 资源组。Online 资源组主要用于高并发低延迟的在线数据服务,而一些大查询或离线 ETL 操作,则可以使用 Offline 资源组中的节点执行。从而实现在统一集群内同时提供在线和离线服务的能力。 +- 不同业务之间隔离,此时多个业务之间数据没有共享,可以为每个业务划分一个 Resource Group,多个业务之间没有任何干扰,这实际上是把多个物理集群合并到统一的一个大集群管理; - 不同用户之间隔离,比如集群内有一张业务表需要共享给所有 3 个用户使用,但是希望能够尽量避免不同用户之间的资源抢占。则我们可以为这张表创建 3 个副本,分别存储在 3 个资源组中,为个用户绑定一个资源组。 ## 配置 Resource Group ### 为 BE 设置标签 - 假设当前 Doris 集群有 6 个 BE 节点。分别为 host[1-6]。在初始情况下,所有BE节点都属于一个默认资源组(Default)。 + 假设当前 Doris 集群有 6 个 BE 节点。分别为 host[1-6]。在初始情况下,所有 BE 节点都属于一个默认资源组(Default)。 我们可以使用以下命令将这 6 个节点划分成 3 个资源组:group_a、group_b、group_c: @@ -116,7 +116,7 @@ Resource Group本质上是一种Table副本的放置策略,所以它有以下 └────────────────────────────────────────────────────┘ ``` - 当一个DB 下有非常多的Table时,修改每个Table的分布策略是非常繁琐的,所以Doris 还支持了在 database 层面设置统一的数据分布策略,但是 table 设置的优先级高于 database。比如有一个 db1, db1 下有四个 table,table1 需要的副本分布策略为 `group_a:1,group_b:2`,table2,table3, table4 需要的副本分布策略为 `group_c:1,group_b:2` + 当一个 DB 下有非常多的 Table 时,修改每个 Table 的分布策略是非常繁琐的,所以 Doris 还支持了在 database 层面设置统一的数据分布策略,但是 table 设置的优先级高于 database。比如有一个 db1, db1 下有四个 table,table1 需要的副本分布策略为 `group_a:1,group_b:2`,table2,table3, table4 需要的副本分布策略为 `group_c:1,group_b:2` 那么可以使用如下语句创建 db1: diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-group.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-group.md index ca5c02b911c96..e7aa2a44ef845 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-group.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-group.md @@ -24,33 +24,33 @@ specific language governing permissions and limitations under the License. --> -Workload Group 是一种进程内实现的对负载进行逻辑隔离的机制,它通过对BE进程内的资源(CPU,IO,Memory)进行细粒度的划分或者限制,达到资源隔离的目的,它的原理如下图所示: +Workload Group 是一种进程内实现的对负载进行逻辑隔离的机制,它通过对 BE 进程内的资源(CPU,IO,Memory)进行细粒度的划分或者限制,达到资源隔离的目的,它的原理如下图所示: ![workload_group](/images/workload_group_arch.png) 目前支持的隔离能力包括: -* 管理CPU资源,支持CPU硬限和CPU软限; +* 管理 CPU 资源,支持 CPU 硬限和 CPU 软限; * 管理内存资源,支持内存硬限和内存软限; -* 管理IO资源,包括读本地文件和远程文件产生的IO。 +* 管理 IO 资源,包括读本地文件和远程文件产生的 IO。 ## 版本说明 - 自 Doris 2.0 版本开始提供 Workload Group 功能。在 Doris 2.0 版本中,Workload Group 功能不依赖于 CGroup,而 Doris 2.1 版本中需要依赖 CGroup。 -- 从 Doris 1.2 升级到 2.0:建议集群升级完成后,再开启 Workload Group功能。只升级部分 follower FE 节点,可能会因为未升级的 FE 节点没有 Workload Group 的元数据信息,导致已升级的 follower FE 节点查询失败。 +- 从 Doris 1.2 升级到 2.0:建议集群升级完成后,再开启 Workload Group 功能。只升级部分 follower FE 节点,可能会因为未升级的 FE 节点没有 Workload Group 的元数据信息,导致已升级的 follower FE 节点查询失败。 - 从 Doris 2.0 升级到 2.1:由于 2.1 版本的 Workload Group 功能依赖于 CGroup,需要先配置 CGroup 环境,再升级到 Doris 2.1 版本。 -## 配置workload group +## 配置 workload group ### 配置 CGroup 环境 -Workload Group 支持对于 CPU , 内存, IO 资源的管理,其中对于 CPU 的管理依赖 CGroup 组件;如果期望使用 Workload Group 管理CPU资源,那么首先需要进行 CGroup 环境的配置。 +Workload Group 支持对于 CPU,内存,IO 资源的管理,其中对于 CPU 的管理依赖 CGroup 组件;如果期望使用 Workload Group 管理 CPU 资源,那么首先需要进行 CGroup 环境的配置。 -以下为CGroup环境配置流程: +以下为 CGroup 环境配置流程: -1. 首先确认 BE 所在节点是否已经安装好 GGroup,输出结果中```cgroup``` 代表目前的环境已经安装CGroup V1,```cgroup2``` 代表目前的环境已安装CGroup V2,至于具体是哪个版本生效,可以通过下一步确认。 +1. 首先确认 BE 所在节点是否已经安装好 GGroup,输出结果中```cgroup``` 代表目前的环境已经安装 CGroup V1,```cgroup2``` 代表目前的环境已安装 CGroup V2,至于具体是哪个版本生效,可以通过下一步确认。 ```shell cat /proc/filesystems | grep cgroup nodev cgroup @@ -95,8 +95,8 @@ chmod 770 /sys/fs/cgroup/doris chown -R doris:doris /sys/fs/cgroup/doris ``` -5. 如果目前环境里生效的是CGroup v2版本,那么还需要做以下操作。这是因为CGroup v2对于权限管控比较严格,需要具备根目录的cgroup.procs文件的写权限才能实现进程在group之间的移动。 - 如果是CGroup v1那么不需要这一步。 +5. 如果目前环境里生效的是 CGroup v2 版本,那么还需要做以下操作。这是因为 CGroup v2 对于权限管控比较严格,需要具备根目录的 cgroup.procs 文件的写权限才能实现进程在 group 之间的移动。 + 如果是 CGroup v1 那么不需要这一步。 ```shell chmod a+w /sys/fs/cgroup/cgroup.procs ``` @@ -113,21 +113,21 @@ doris_cgroup_cpu_path = /sys/fs/cgroup/doris 7. 重启 BE,在日志(be.INFO)可以看到"add thread xxx to group"的字样代表配置成功 :::tip -1. 建议单台机器上只部署一个 BE 实例,目前的 Workload Group 功能不支持一个机器上部署多个 BE ; -2. 当机器重启之后,CGroup 路径下的所有配置就会清空。如果期望CGroup配置持久化,可以使用 systemd 把操作设置成系统的自定义服务,这样在每次机器重启的时可以自动完成创建和授权操作 +1. 建议单台机器上只部署一个 BE 实例,目前的 Workload Group 功能不支持一个机器上部署多个 BE; +2. 当机器重启之后,CGroup 路径下的所有配置就会清空。如果期望 CGroup 配置持久化,可以使用 systemd 把操作设置成系统的自定义服务,这样在每次机器重启的时可以自动完成创建和授权操作 3. 如果是在容器内使用 CGroup,需要容器具备操作宿主机的权限。 ::: -#### 在容器中使用Workload Group的注意事项 -Workload的CPU管理是基于CGroup实现的,如果期望在容器中使用Workload Group,那么需要以特权模式启动容器,容器内的Doris进程才能具备读写宿主机CGroup文件的权限。 -当Doris在容器内运行时,Workload Group的CPU资源用量是在容器可用资源的情况下再划分的,例如宿主机整机是64核,容器被分配了8个核的资源,Workload Group配置的CPU硬限为50%, -那么Workload Group实际可用核数为4个(8核 * 50%)。 +#### 在容器中使用 Workload Group 的注意事项 +Workload 的 CPU 管理是基于 CGroup 实现的,如果期望在容器中使用 Workload Group,那么需要以特权模式启动容器,容器内的 Doris 进程才能具备读写宿主机 CGroup 文件的权限。 +当 Doris 在容器内运行时,Workload Group 的 CPU 资源用量是在容器可用资源的情况下再划分的,例如宿主机整机是 64 核,容器被分配了 8 个核的资源,Workload Group 配置的 CPU 硬限为 50%, +那么 Workload Group 实际可用核数为 4 个(8 核 * 50%)。 -WorkloadGroup的内存管理和IO管理功能是Doris内部实现,不依赖外部组件,因此在容器和物理机上部署使用并没有区别。 +WorkloadGroup 的内存管理和 IO 管理功能是 Doris 内部实现,不依赖外部组件,因此在容器和物理机上部署使用并没有区别。 -如果要在K8S上使用Doris,建议使用Doris Operator进行部署,可以屏蔽底层的权限细节问题。 +如果要在 K8S 上使用 Doris,建议使用 Doris Operator 进行部署,可以屏蔽底层的权限细节问题。 -### 创建Workload Group +### 创建 Workload Group ``` mysql [information_schema]>create workload group if not exists g1 -> properties ( @@ -140,24 +140,24 @@ Query OK, 0 rows affected (0.03 sec) 此时配置的 CPU 限制为软限。自 2.1 版本起,系统会自动创建一个名为```normal```的 group,不可删除。 -### Workload Group属性 +### Workload Group 属性 | 属性名称 | 数据类型 | 默认值 | 取值范围 | 说明 | |------------------------------|---------|-----|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| cpu_share | 整型 | -1 | [1, 10000] | 可选,CPU软限模式下生效,取值范围和使用的CGroup版本有关,下文有详细描述。cpu_share 代表了 Workload Group 可获得CPU时间的权重,值越大,可获得的CPU时间越多。例如,用户创建了 3 个 Workload Group g-a、g-b 和 g-c,cpu_share 分别为 10、30、40,某一时刻 g-a 和 g-b 正在跑任务,而 g-c 没有任务,此时 g-a 可获得 25% (10 / (10 + 30)) 的 CPU 资源,而 g-b 可获得 75% 的 CPU 资源。如果系统只有一个 Workload Group 正在运行,则不管其 cpu_share 的值为多少,它都可获取全部的 CPU 资源 。 | -| memory_limit | 浮点 | -1 | (0%, 100%] | 可选,开启内存硬限时代表当前 Workload Group 最大可用内存百分比,默认值代表不限制内存。所有 Workload Group 的 memory_limit 累加值不可以超过 100%,通常与 enable_memory_overcommit 属性配合使用。如果一个机器的内存为 64G,Workload Group 的 memory_limit配置为50%,那么该 group 的实际物理内存=64G * 90% * 50%= 28.8G,这里的90%是 BE 进程可用内存配置的默认值。 | -| enable_memory_overcommit | 布尔 | true | true, false | 可选,用于控制当前 Workload Group 的内存限制是硬限还是软限,默认为 true。如果设置为 false,则该 workload group 为内存硬隔离,系统检测到 workload group 内存使用超出限制后将立即 cancel 组内内存占用最大的若干个任务,以释放超出的内存;如果设置为 true,则该 Workload Group 为内存软隔离,如果系统有空闲内存资源则该 Workload Group 在超出 memory_limit 的限制后可继续使用系统内存,在系统总内存紧张时会 cancel 组内内存占用最大的若干个任务,释放部分超出的内存以缓解系统内存压力。建议所有 workload group 的 memory_limit 总和低于 100%,为BE进程中的其他组件保留一些内存。 | -| cpu_hard_limit | 整型 | -1 | [1%, 100%] | 可选,CPU 硬限制模式下生效,Workload Group 最大可用 CPU 百分比,不管当前机器的 CPU 资源是否被用满,Workload Group 的最大 CPU 用量都不能超过 cpu_hard_limit,所有 Workload Group 的 cpu_hard_limit 累加值不能超过 100%。2.1 版本新增属性,2.0版本不支持该功能。 | +| cpu_share | 整型 | -1 | [1, 10000] | 可选,CPU 软限模式下生效,取值范围和使用的 CGroup 版本有关,下文有详细描述。cpu_share 代表了 Workload Group 可获得 CPU 时间的权重,值越大,可获得的 CPU 时间越多。例如,用户创建了 3 个 Workload Group g-a、g-b 和 g-c,cpu_share 分别为 10、30、40,某一时刻 g-a 和 g-b 正在跑任务,而 g-c 没有任务,此时 g-a 可获得 25% (10 / (10 + 30)) 的 CPU 资源,而 g-b 可获得 75% 的 CPU 资源。如果系统只有一个 Workload Group 正在运行,则不管其 cpu_share 的值为多少,它都可获取全部的 CPU 资源。 | +| memory_limit | 浮点 | -1 | (0%, 100%] | 可选,开启内存硬限时代表当前 Workload Group 最大可用内存百分比,默认值代表不限制内存。所有 Workload Group 的 memory_limit 累加值不可以超过 100%,通常与 enable_memory_overcommit 属性配合使用。如果一个机器的内存为 64G,Workload Group 的 memory_limit 配置为 50%,那么该 group 的实际物理内存=64G * 90% * 50%= 28.8G,这里的 90% 是 BE 进程可用内存配置的默认值。 | +| enable_memory_overcommit | 布尔 | true | true, false | 可选,用于控制当前 Workload Group 的内存限制是硬限还是软限,默认为 true。如果设置为 false,则该 workload group 为内存硬隔离,系统检测到 workload group 内存使用超出限制后将立即 cancel 组内内存占用最大的若干个任务,以释放超出的内存;如果设置为 true,则该 Workload Group 为内存软隔离,如果系统有空闲内存资源则该 Workload Group 在超出 memory_limit 的限制后可继续使用系统内存,在系统总内存紧张时会 cancel 组内内存占用最大的若干个任务,释放部分超出的内存以缓解系统内存压力。建议所有 workload group 的 memory_limit 总和低于 100%,为 BE 进程中的其他组件保留一些内存。 | +| cpu_hard_limit | 整型 | -1 | [1%, 100%] | 可选,CPU 硬限制模式下生效,Workload Group 最大可用 CPU 百分比,不管当前机器的 CPU 资源是否被用满,Workload Group 的最大 CPU 用量都不能超过 cpu_hard_limit,所有 Workload Group 的 cpu_hard_limit 累加值不能超过 100%。2.1 版本新增属性,2.0 版本不支持该功能。 | | max_concurrency | 整型 | 2147483647 | [0, 2147483647] | 可选,最大查询并发数,默认值为整型最大值,也就是不做并发的限制。运行中的查询数量达到最大并发时,新来的查询会进入排队的逻辑。 | | max_queue_size | 整型 | 0 | [0, 2147483647] | 可选,查询排队队列的长度,当排队队列已满时,新来的查询会被拒绝。默认值为 0,含义是不排队。当排队队列已满时,新来的查询会直接失败。 | | queue_timeout | 整型 | 0 | [0, 2147483647] | 可选,查询在排队队列中的最大等待时间,单位为毫秒。如果查询在队列中的排队时间超过这个值,那么就会直接抛出异常给客户端。默认值为 0,含义是不排队,查询进入队列后立即返回失败。 | -| scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,当前 workload group 用于 scan 的线程个数。当该属性为 -1,含义是不生效,此时在BE上的实际取值为 BE 配置中的```doris_scanner_thread_pool_thread_num```。 | -| max_remote_scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,读外部数据源的scan线程池的最大线程数。当该属性为-1时,实际的线程数由BE自行决定,通常和核数相关。 | -| min_remote_scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,读外部数据源的scan线程池的最小线程数。当该属性为-1时,实际的线程数由BE自行决定,通常和核数相关。 | -| tag | 字符串 | 空 | - | 为Workload Group指定分组标签,相同标签的Workload Group资源累加值不能超过100%;如果期望指定多个值,可以使用英文逗号分隔。 | -| read_bytes_per_second | 整型 | -1 | [1, 9223372036854775807] | 可选,含义为读Doris内表时的最大IO吞吐,默认值为-1,也就是不限制IO带宽。需要注意的是这个值并不绑定磁盘,而是绑定文件夹。比如为Doris配置了2个文件夹用于存放内表数据,那么每个文件夹的最大读IO不会超过该值,如果这2个文件夹都配置到同一块盘上,最大吞吐控制就会变成2倍的read_bytes_per_second。落盘的文件目录也受该值的约束。 | -| remote_read_bytes_per_second | 整型 | -1 | [1, 9223372036854775807] | 可选,含义为读Doris外表时的最大IO吞吐,默认值为-1,也就是不限制IO带宽。 | +| scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,当前 workload group 用于 scan 的线程个数。当该属性为 -1,含义是不生效,此时在 BE 上的实际取值为 BE 配置中的```doris_scanner_thread_pool_thread_num```。 | +| max_remote_scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,读外部数据源的 scan 线程池的最大线程数。当该属性为 -1 时,实际的线程数由 BE 自行决定,通常和核数相关。 | +| min_remote_scan_thread_num | 整型 | -1 | [1, 2147483647] | 可选,读外部数据源的 scan 线程池的最小线程数。当该属性为 -1 时,实际的线程数由 BE 自行决定,通常和核数相关。 | +| tag | 字符串 | 空 | - | 为 Workload Group 指定分组标签,相同标签的 Workload Group 资源累加值不能超过 100%;如果期望指定多个值,可以使用英文逗号分隔。 | +| read_bytes_per_second | 整型 | -1 | [1, 9223372036854775807] | 可选,含义为读 Doris 内表时的最大 IO 吞吐,默认值为 -1,也就是不限制 IO 带宽。需要注意的是这个值并不绑定磁盘,而是绑定文件夹。比如为 Doris 配置了 2 个文件夹用于存放内表数据,那么每个文件夹的最大读 IO 不会超过该值,如果这 2 个文件夹都配置到同一块盘上,最大吞吐控制就会变成 2 倍的 read_bytes_per_second。落盘的文件目录也受该值的约束。 | +| remote_read_bytes_per_second | 整型 | -1 | [1, 9223372036854775807] | 可选,含义为读 Doris 外表时的最大 IO 吞吐,默认值为 -1,也就是不限制 IO 带宽。 | :::tip @@ -165,13 +165,13 @@ Query OK, 0 rows affected (0.03 sec) 2. 所有属性均为可选,但是在创建 Workload Group 时需要指定至少一个属性。 -3. 需要注意 CGroup v1 CGroup v2 版本 CPU 软限默认值是有区别的, CGroup v1 的 CPU 软限默认值为1024,取值范围为2到262144。而 CGroup v2 的 CPU 软限默认值为100,取值范围是1到10000。 - 如果软限填了一个超出范围的值,这会导致 CPU 软限在BE修改失败。如果在 CGroup v1 的环境上如果按照CGroup v2的默认值100设置,这可能导致这个workload group的优先级在该机器上是最低的。 +3. 需要注意 CGroup v1 CGroup v2 版本 CPU 软限默认值是有区别的,CGroup v1 的 CPU 软限默认值为 1024,取值范围为 2 到 262144。而 CGroup v2 的 CPU 软限默认值为 100,取值范围是 1 到 10000。 + 如果软限填了一个超出范围的值,这会导致 CPU 软限在 BE 修改失败。如果在 CGroup v1 的环境上如果按照 CGroup v2 的默认值 100 设置,这可能导致这个 workload group 的优先级在该机器上是最低的。 ::: -## 为用户设置Workload Group -在把用户绑定到某个Workload Group之前,需要先确定该用户是否具有某个 Workload Group 的权限。 -可以使用这个用户查看 information_schema.workload_groups 系统表,返回的结果就是当前用户有权限使用的Workload Group。 +## 为用户设置 Workload Group +在把用户绑定到某个 Workload Group 之前,需要先确定该用户是否具有某个 Workload Group 的权限。 +可以使用这个用户查看 information_schema.workload_groups 系统表,返回的结果就是当前用户有权限使用的 Workload Group。 下面的查询结果代表当前用户可以使用 g1 与 normal Workload Group: ```sql @@ -184,15 +184,15 @@ SELECT name FROM information_schema.workload_groups; +--------+ ``` -如果无法看到 g1 Workload Group,可以使用ADMIN账户执行 GRANT 语句为用户授权。例如: +如果无法看到 g1 Workload Group,可以使用 ADMIN 账户执行 GRANT 语句为用户授权。例如: ``` "GRANT USAGE_PRIV ON WORKLOAD GROUP 'g1' TO 'user_1'@'%';" ``` -这个语句的含义是把名为 g1 的 Workload Group的使用权限授予给名为 user_1 的账户。 +这个语句的含义是把名为 g1 的 Workload Group 的使用权限授予给名为 user_1 的账户。 更多授权操作可以参考[grant 语句](../../sql-manual/sql-statements/Account-Management-Statements/GRANT)。 **两种绑定方式** -1. 通过设置 user property 将 user 默认绑定到 workload group,默认为`normal`,需要注意的这里的value不能填空,否则语句会执行失败。 +1. 通过设置 user property 将 user 默认绑定到 workload group,默认为`normal`,需要注意的这里的 value 不能填空,否则语句会执行失败。 ``` set property 'default_workload_group' = 'g1'; ``` @@ -203,10 +203,10 @@ set property 'default_workload_group' = 'g1'; ``` set workload_group = 'g1'; ``` -当同时使用了两种方式时为用户指定了Workload Group,session 变量的优先级要高于 user property 。 +当同时使用了两种方式时为用户指定了 Workload Group,session 变量的优先级要高于 user property。 -## 查看Workload Group -1. 通过show语句查看 +## 查看 Workload Group +1. 通过 show 语句查看 ``` show workload groups; ``` @@ -223,7 +223,7 @@ mysql [information_schema]>select * from information_schema.workload_groups wher 1 row in set (0.05 sec) ``` -## 修改Workload Group +## 修改 Workload Group ``` mysql [information_schema]>alter workload group g1 properties('cpu_share'='2048'); Query OK, 0 rows affected (0.00 sec @@ -240,7 +240,7 @@ mysql [information_schema]>select cpu_share from information_schema.workload_gro 可以参考:[ALTER-WORKLOAD-GROUP](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-WORKLOAD-GROUP)。 -## 删除Workload Group +## 删除 Workload Group ``` mysql [information_schema]>drop workload group g1; Query OK, 0 rows affected (0.01 sec) @@ -256,7 +256,7 @@ Query OK, 0 rows affected (0.01 sec) ``` alter workload group test_group properties ( 'cpu_hard_limit'='20%' ); ``` -集群中所有的Workload Group都需要修改,所有 Workload Group 的 cpu_hard_limit 的累加值不能超过 100% 。 +集群中所有的 Workload Group 都需要修改,所有 Workload Group 的 cpu_hard_limit 的累加值不能超过 100% 。 由于 CPU 的硬限无法给出一个有效的默认值,因此如果只打开开关但是不修改属性,那么 CPU 的硬限也无法生效。 @@ -446,7 +446,7 @@ Workload Group 支持 CPU 的软限和硬限,目前比较推荐在线上环境 2. 目前 FE 向 BE 同步 Workload Group 元数据的时间间隔为 30 秒,因此对于 Workload Group 的变更最大需要等待 30 秒才能生效。 -### 本地IO 硬限 +### 本地 IO 硬限 OLAP 系统在做 ETL 或者大的 Adhoc 查询时,需要读取大量的数据,Doris 为了加速数据分析过程,内部会使用多线程并行的方式对多个磁盘文件扫描,会产生巨大的磁盘 IO,就会对其他的查询(比如报表分析)产生影响。 可以通过 Workload Group 对离线的 ETL 数据处理和在线的报表查询做分组,限制离线数据处理 IO 带宽的方式,降低它对在线报表分析的影响。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-management-summary.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-management-summary.md index 03091dc107153..65351a2fb7cc1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-management-summary.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/workload-management/workload-management-summary.md @@ -1,6 +1,6 @@ --- { -"title": "概述", +"title": "负载管理概述", "language": "zh-CN" } --- @@ -24,28 +24,28 @@ specific language governing permissions and limitations under the License. --> -负载管理是Doris一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: +负载管理是 Doris 一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: -- 负载隔离: 通过划分多个Group,并且为每个Group都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(读写操作)之间互不干扰; +- 负载隔离:通过划分多个 Group,并且为每个 Group 都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(读写操作)之间互不干扰; -- 并发控制与排队: 可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; +- 并发控制与排队:可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; -- 熔断: 在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 +- 熔断:在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 ## 资源划分方式 -Doris 可以通过以下2种方式将资源分组: +Doris 可以通过以下 2 种方式将资源分组: - Resource Group: 以 BE 节点为最小粒度,通过设置标签(tag)的方式,划分出多个资源组; -- Workload Group: 将一个BE内的资源(CPU、Memory、IO)通过Cgroup划分出多个资源组,实现更细致的资源分配; +- Workload Group: 将一个 BE 内的资源(CPU、Memory、IO)通过 Cgroup 划分出多个资源组,实现更细致的资源分配; 下表中记录了不同资源组划分方式的特点及优势场景: | 资源隔离方式 | 隔离粒度 | 软/硬限制 | 跨组查询 | | ---------- | ----------- |-----|-----| -| Resource Group | 服务器节点级别,资源完全隔离;可以隔离BE故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | -| Workload Group | BE 进程内隔离;不能隔离BE故障 | 支持硬限制与软限制 | 支持跨资源组查询 | +| Resource Group | 服务器节点级别,资源完全隔离;可以隔离 BE 故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | +| Workload Group | BE 进程内隔离;不能隔离 BE 故障 | 支持硬限制与软限制 | 支持跨资源组查询 | ## 软限与硬限 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/install-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/install-faq.md index b03ebfba2c95b..9e538de642dd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/install-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/install-faq.md @@ -267,7 +267,7 @@ http { 2. 集群内多数 Follower FE 节点未启动。比如有 3 个 Follower,只启动了一个。此时需要将另外至少一个 FE 也启动,FE 可选举组方能选举出 Master 已提供服务。 -如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md)进行恢复。 +如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md)进行恢复。 ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -277,7 +277,7 @@ http { 有时重启 FE,会出现如上错误(通常只会出现在多 Follower 的情况下)。并且错误中的两个数值相差 2。导致 FE 启动失败。 -这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 +这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 ### Q12. Doris 编译安装 JDK 版本不兼容问题 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/releasenotes/v2.1/release-2.1.6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/releasenotes/v2.1/release-2.1.6.md index 738e6b4b0d326..71d66de1b4258 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/releasenotes/v2.1/release-2.1.6.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/releasenotes/v2.1/release-2.1.6.md @@ -112,7 +112,7 @@ under the License. - 更多信息,请查看文档 [table_properties](../../admin-manual/system-tables/information_schema/table_properties/) - 新增 FE 中死锁和慢锁检测功能。 - - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/maint-monitor/frontend-lock-manager/) + - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/trouble-shooting/frontend-lock-manager) ## 改进提升 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/cluster-management/time-zone.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/cluster-management/time-zone.md index e464bcf6d3420..85a5edf65808d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/cluster-management/time-zone.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/cluster-management/time-zone.md @@ -1,6 +1,6 @@ --- { - "title": "时区", + "title": "时区管理", "language": "zh-CN" } --- @@ -209,7 +209,7 @@ Doris 目前兼容各时区下的数据向 Doris 中进行导入。而由于 Dor ### 信息更新 -真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的IANA 数据保持一致,请采取下列方式进行更新: +真实世界中的时区与夏令时相关数据,将会因各种原因而不定期发生变化。IANA 会定期记录这些变化并更新相应时区文件。如果希望 Doris 中的时区信息与最新的 IANA 数据保持一致,请采取下列方式进行更新: 1. 使用包管理器更新 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup.md deleted file mode 100644 index ebfa030e9673c..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,249 +0,0 @@ ---- -{ - "title": "数据备份", - "language": "zh-CN" -} ---- - - - -# 数据备份 - -Doris 支持将当前数据以文件的形式,通过 broker 备份到远端存储系统中。之后可以通过 恢复 命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期的进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移。 - -该功能需要 Doris 版本 0.8.2+ - -使用该功能,需要部署对应远端存储的 broker。如 BOS、HDFS 等。可以通过 `SHOW BROKER;` 查看当前部署的 broker。 - -## 简要原理说明 - -备份操作是将指定表或分区的数据,直接以 Doris 存储的文件的形式,上传到远端仓库中进行存储。当用户提交 Backup 请求后,系统内部会做如下操作: - -1. 快照及快照上传 - - 快照阶段会对指定的表或分区数据文件进行快照。之后,备份都是对快照进行操作。在快照之后,对表进行的更改、导入等操作都不再影响备份的结果。快照只是对当前数据文件产生一个硬链,耗时很少。快照完成后,会开始对这些快照文件进行逐一上传。快照上传由各个 Backend 并发完成。 - -2. 元数据准备及上传 - - 数据文件快照上传完成后,Frontend 会首先将对应元数据写成本地文件,然后通过 broker 将本地元数据文件上传到远端仓库。完成最终备份作业 - -3. 动态分区表说明 - - 如果该表是动态分区表,备份之后会自动禁用动态分区属性,在做恢复的时候需要手动将该表的动态分区属性启用,命令如下: - - ```sql - ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") - ``` - -4. 备份和恢复操作都不会保留表的 `colocate_with` 属性。 - -## 开始备份 - -1. 创建一个 hdfs 的远程仓库 example_repo: - - **WITH HDFS(推荐使用)** - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - - **WITH BROKER** - - 需要先启动一个 BROKER 进程。 - - ```sql - CREATE REPOSITORY `example_repo` - WITH BROKER `broker_name` - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "username" = "user", - "password" = "password" - ); - ``` - -2. 创建一个 s3 的远程仓库 : s3_repo - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY"="xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >注意: - > - >ON LOCATION 这里后面跟的是 Bucket Name - -2. 全量备份 example_db 下的表 example_tbl 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -3. 全量备份 example_db 下,表 example_tbl 的 p1, p2 分区,以及表 example_tbl2 到仓库 example_repo 中: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -4. 查看最近 backup 作业的执行情况: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -5. 查看远端仓库中已存在的备份 - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -BACKUP 的更多用法可参考 [这里](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md)。 - -## 最佳实践 - -### 备份 - -当前我们支持最小分区(Partition)粒度的全量备份(增量备份有可能在未来版本支持)。如果需要对数据进行定期备份,首先需要在建表时,合理的规划表的分区及分桶,比如按时间进行分区。然后在之后的运行过程中,按照分区粒度进行定期的数据备份。 - -### 数据迁移 - -用户可以先将数据备份到远端仓库,再通过远端仓库将数据恢复到另一个集群,完成数据迁移。因为数据备份是通过快照的形式完成的,所以,在备份作业的快照阶段之后的新的导入数据,是不会备份的。因此,在快照完成后,到恢复作业完成这期间,在原集群上导入的数据,都需要在新集群上同样导入一遍。 - -建议在迁移完成后,对新旧两个集群并行导入一段时间。完成数据和业务正确性校验后,再将业务迁移到新的集群。 - -## 重点说明 - -1. 备份恢复相关的操作目前只允许拥有 ADMIN 权限的用户执行。 -2. 一个 Database 内,只允许有一个正在执行的备份或恢复作业。 -3. 备份和恢复都支持最小分区(Partition)级别的操作,当表的数据量很大时,建议按分区分别执行,以降低失败重试的代价。 -4. 因为备份恢复操作,操作的都是实际的数据文件。所以当一个表的分片过多,或者一个分片有过多的小版本时,可能即使总数据量很小,依然需要备份或恢复很长时间。用户可以通过 `SHOW PARTITIONS FROM table_name;` 和 `SHOW TABLETS FROM table_name;` 来查看各个分区的分片数量,以及各个分片的文件版本数量,来预估作业执行时间。文件数量对作业执行的时间影响非常大,所以建议在建表时,合理规划分区分桶,以避免过多的分片。 -5. 当通过 `SHOW BACKUP` 或者 `SHOW RESTORE` 命令查看作业状态时。有可能会在 `TaskErrMsg` 一列中看到错误信息。但只要 `State` 列不为 `CANCELLED`,则说明作业依然在继续。这些 Task 有可能会重试成功。当然,有些 Task 错误,也会直接导致作业失败。 - 常见的`TaskErrMsg`错误如下: - Q1:备份到 HDFS,状态显示 UPLOADING,TaskErrMsg 错误信息:[13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0,port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception: ] - 这个一般是网络通信问题,查看broker日志,看某个ip 或者端口不通,如果是云服务,则需要查看是否访问了内网,如果是,则可以在borker/conf文件夹下添加hdfs-site.xml,还需在hdfs-site.xml配置文件下添加dfs.client.use.datanode.hostname=true,并在broker节点上配置HADOOP集群的主机名映射。 -6. 如果恢复作业是一次覆盖操作(指定恢复数据到已经存在的表或分区中),那么从恢复作业的 `COMMIT` 阶段开始,当前集群上被覆盖的数据有可能不能再被还原。此时如果恢复作业失败或被取消,有可能造成之前的数据已损坏且无法访问。这种情况下,只能通过再次执行恢复操作,并等待作业完成。因此,我们建议,如无必要,尽量不要使用覆盖的方式恢复数据,除非确认当前数据已不再使用。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -1. CREATE REPOSITORY - - 创建一个远端仓库路径,用于备份或恢复。该命令需要借助 Broker 进程访问远端存储,不同的 Broker 需要提供不同的参数,具体请参阅 [Broker 文档](../../data-operate/import/broker-load-manual#其他-broker-导入),也可以直接通过 S3 协议备份到支持 AWS S3 协议的远程存储上去,也可以直接备份到 HDFS,具体参考 [创建远程仓库文档](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md) - -2. BACKUP - - 执行一次备份操作。 - -3. SHOW BACKUP - - 查看最近一次 backup 作业的执行情况,包括: - - - JobId:本次备份作业的 id。 - - SnapshotName:用户指定的本次备份作业的名称(Label)。 - - DbName:备份作业对应的 Database。 - - State:备份作业当前所在阶段: - - PENDING:作业初始状态。 - - SNAPSHOTING:正在进行快照操作。 - - UPLOAD_SNAPSHOT:快照结束,准备上传。 - - UPLOADING:正在上传快照。 - - SAVE_META:正在本地生成元数据文件。 - - UPLOAD_INFO:上传元数据文件和本次备份作业的信息。 - - FINISHED:备份完成。 - - CANCELLED:备份失败或被取消。 - - BackupObjs:本次备份涉及的表和分区的清单。 - - CreateTime:作业创建时间。 - - SnapshotFinishedTime:快照完成时间。 - - UploadFinishedTime:快照上传完成时间。 - - FinishedTime:本次作业完成时间。 - - UnfinishedTasks:在 `SNAPSHOTTING`,`UPLOADING` 等阶段,会有多个子任务在同时进行,这里展示的当前阶段,未完成的子任务的 task id。 - - TaskErrMsg:如果有子任务执行出错,这里会显示对应子任务的错误信息。 - - Status:用于记录在整个作业过程中,可能出现的一些状态信息。 - - Timeout:作业的超时时间,单位是秒。 - -4. SHOW SNAPSHOT - - 查看远端仓库中已存在的备份。 - - - Snapshot:备份时指定的该备份的名称(Label)。 - - Timestamp:备份的时间戳。 - - Status:该备份是否正常。 - - 如果在 `SHOW SNAPSHOT` 后指定了 where 子句,则可以显示更详细的备份信息。 - - - Database:备份时对应的 Database。 - - Details:展示了该备份完整的数据目录结构。 - -5. CANCEL BACKUP - - 取消当前正在执行的备份作业。 - -6. DROP REPOSITORY - - 删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。 - -## 更多帮助 - - 关于 BACKUP 使用的更多详细语法及最佳实践,请参阅 [BACKUP](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md) 命令手册,你也可以在 MySql 客户端命令行下输入 `HELP BACKUP` 获取更多帮助信息。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/ccr.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/ccr.md deleted file mode 100644 index 68c2c047cd7f6..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/ccr.md +++ /dev/null @@ -1,643 +0,0 @@ ---- -{ - "title": "跨集群数据同步", - "language": "zh-CN" -} ---- - - - -## 概览 - -CCR(Cross Cluster Replication) 是跨集群数据同步,能够在库/表级别将源集群的数据变更同步到目标集群,可用于在线服务的数据可用性、隔离在离线负载、建设两地三中心。 - -CCR 通常被用于容灾备份、读写分离、集团与公司间数据传输和隔离升级等场景。 - -- 容灾备份:通常是将企业的数据备份到另一个集群与机房中,当突发事件导致业务中断或丢失时,可以从备份中恢复数据或快速进行主备切换。一般在对 SLA 要求比较高的场景中,都需要进行容灾备份,比如在金融、医疗、电子商务等领域中比较常见。 - -- 读写分离:读写分离是将数据的查询操作和写入操作进行分离,目的是降低读写操作的相互影响并提升资源的利用率。比如在数据库写入压力过大或在高并发场景中,采用读写分离可以将读/写操作分散到多个地域的只读/只写的数据库案例上,减少读写间的互相影响,有效保证数据库的性能及稳定性。 - -- 集团与分公司间数据传输:集团总部为了对集团内数据进行统一管控和分析,通常需要分布在各地域的分公司及时将数据传输同步到集团总部,避免因为数据不一致而引起的管理混乱和决策错误,有利于提高集团的管理效率和决策质量。 - -- 隔离升级:当在对系统集群升级时,有可能因为某些原因需要进行版本回滚,传统的升级模式往往会因为元数据不兼容的原因无法回滚。而使用 CCR 可以解决该问题,先构建一个备用的集群进行升级并双跑验证,用户可以依次升级各个集群,同时 CCR 也不依赖特定版本,使版本的回滚变得可行。 - -## 原理 - -### 名词解释 - -源集群:源头集群,业务数据写入的集群,需要 2.0 版本 - -目标集群:跨集群同步的目标集群,需要 2.0 版本 - -binlog:源集群的变更日志,包括 schema 和数据变更 - -syncer:一个轻量级的进程 - -### 架构说明 - - -![ccr 架构说明](/images/ccr-architecture-description.png) - -CCR 工具主要依赖一个轻量级进程:Syncers。Syncers 会从源集群获取 binlog,直接将元数据应用于目标集群,通知目标集群从源集群拉取数据。从而实现全量和增量迁移。 - -## 使用 - -使用非常简单,只需把 Syncers 服务启动,给他发一个命令,剩下的交给 Syncers 完成就行。 - -**1. 部署源 Doris 集群** - -**2. 部署目标 Doris 集群** - -**3. 首先源集群和目标集群都需要打开 binlog,在源集群和目标集群的 fe.conf 和 be.conf 中配置如下信息:** - -```sql -enable_feature_binlog=true -``` - -**4. 部署 syncers** - -1. 构建 CCR syncer - - ```shell - git clone https://github.com/selectdb/ccr-syncer - - cd ccr-syncer - - bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR> - - cd SYNCER_OUTPUT_DIR# 联系相关同学免费获取 ccr 二进制包 - ``` - -2. 启动和停止 syncer - - ```shell - # 启动 - cd bin && sh start_syncer.sh --daemon - - # 停止 - sh stop_syncer.sh - ``` - -**5. 打开源集群中同步库/表的 Binlog** - -```shell --- 如果是整库同步,可以执行如下脚本,使得该库下面所有的表都要打开 binlog.enable -vim shell/enable_db_binlog.sh -修改源集群的 host、port、user、password、db -或者 ./enable_db_binlog.sh --host $host --port $port --user $user --password $password --db $db - --- 如果是单表同步,则只需要打开 table 的 binlog.enable,在源集群上执行: -ALTER TABLE enable_binlog SET ("binlog.enable" = "true"); -``` - -**6. 向 syncer 发起同步任务** - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - } -}' http://127.0.0.1:9190/create_ccr -``` - -同步任务的参数说明: - -```shell -name: CCR同步任务的名称,唯一即可 -host、port:对应集群 Master FE的host和mysql(jdbc) 的端口 -user、password:syncer以何种身份去开启事务、拉取数据等 -database、table: -如果是db级别的同步,则填入your_db_name,your_table_name为空 -如果是表级别同步,则需要填入your_db_name,your_table_name -向syncer发起同步任务中的name只能使用一次 -``` - -## Syncer 详细操作手册 - -### 启动 Syncer 说明 - -根据配置选项启动 Syncer,并且在默认或指定路径下保存一个 pid 文件,pid 文件的命名方式为`host_port.pid`。 - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```sql -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` - -:::caution -**后文中的 start_syncer.sh 指的是该路径下的 start_syncer.sh!!!** -::: - -**启动选项** - -1. --daemon - -后台运行 Syncer,默认为 false - -```sql -bash bin/start_syncer.sh --daemon -``` - -2. --db_type - -Syncer 目前能够使用两种数据库来保存自身的元数据,分别为`sqlite3`(对应本地存储)和`mysql`(本地或远端存储) - -```sql -bash bin/start_syncer.sh --db_type mysql -``` - -默认值为 sqlite3 - -在使用 mysql 存储元数据时,Syncer 会使用`CREATE IF NOT EXISTS`来创建一个名为`ccr`的库,ccr 相关的元数据表都会保存在其中 - -3. --db_dir - -**这个选项仅在 db 使用****`sqlite3`****时生效** - -可以通过此选项来指定 sqlite3 生成的 db 文件名及路径。 - -```sql -bash bin/start_syncer.sh --db_dir /path/to/ccr.db -``` - -默认路径为`SYNCER_OUTPUT_DIR/db`,文件名为`ccr.db` - -4. --db_host & db_port & db_user & db_password - -**这个选项仅在 db 使用****`mysql`****时生效** - -```sql -bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456" -``` - -db_host、db_port 的默认值如例子中所示,db_user、db_password 默认值为空 - -5. --log_dir - -日志的输出路径 - -```sql -bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log -``` - -默认路径为`SYNCER_OUTPUT_DIR/log`,文件名为`ccr_syncer.log` - -6. --log_level - -用于指定 Syncer 日志的输出等级。 - -```sql -bash bin/start_syncer.sh --log_level info -``` - -日志的格式如下,其中 hook 只会在`log_level > info`的时候打印: - -```sql -# time level msg hooks -[2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] INFO This is info type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] WARN This is warn type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx -``` - -在--daemon 下,log_level 默认值为`info` - -在前台运行时,log_level 默认值为`trace`,同时日志会通过 tee 来保存到 log_dir - -6. --host && --port - -用于指定 Syncer 的 host 和 port,其中 host 只起到在集群中的区分自身的作用,可以理解为 Syncer 的 name,集群中 Syncer 的名称为`host:port` - -```sql -bash bin/start_syncer.sh --host 127.0.0.1 --port 9190 -``` - -host 默认值为 127.0.0.1,port 的默认值为 9190 - -7. --pid_dir - -用于指定 pid 文件的保存路径 - -pid 文件是 stop_syncer.sh 脚本用于关闭 Syncer 的凭据,里面保存了对应 Syncer 的进程号,为了方便 Syncer 的集群化管理,可以指定 pid 文件的保存路径 - -```sql -bash bin/start_syncer.sh --pid_dir /path/to/pids -``` - -默认值为`SYNCER_OUTPUT_DIR/bin` - -### Syncer 停止说明 - -根据默认或指定路径下 pid 文件中的进程号关闭对应 Syncer,pid 文件的命名方式为`host_port.pid`。 - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` -:::caution -**后文中的 stop_syncer.sh 指的是该路径下的 stop_syncer.sh!!!** -::: - -**停止选项** - -有三种关闭方法: - -1. 关闭目录下单个 Syncer - -​ 指定要关闭 Syncer 的 host && port,注意要与 start_syncer 时指定的 host 一致 - -2. 批量关闭目录下指定 Syncer - -​ 指定要关闭的 pid 文件名,以空格分隔,用`" "`包裹 - -3. 关闭目录下所有 Syncer - -​ 默认即可 - -1. --pid_dir - -指定 pid 文件所在目录,上述三种关闭方法都依赖于 pid 文件的所在目录执行 - -```shell -bash bin/stop_syncer.sh --pid_dir /path/to/pids -``` - -例子中的执行效果就是关闭`/path/to/pids`下所有 pid 文件对应的 Syncers(**方法 3**),`--pid_dir`可与上面三种关闭方法组合使用。 - -默认值为`SYNCER_OUTPUT_DIR/bin` - -2. --host && --port - -关闭 pid_dir 路径下 host:port 对应的 Syncer - -```shell -bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190 -``` - -host 的默认值为 127.0.0.1,port 默认值为空 - -即,单独指定 host 时**方法 1**不生效,会退化为**方法 3**。 - -host 与 port 都不为空时**方法 1**才能生效 - -3. --files - -关闭 pid_dir 路径下指定 pid 文件名对应的 Syncer - -```shell -bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid" -``` - -文件之间用空格分隔,整体需要用`" "`包裹住 - -### Syncer 操作列表 - -**请求的通用模板** - -```shell -curl -X POST -H "Content-Type: application/json" -d {json_body} http://ccr_syncer_host:ccr_syncer_port/operator -``` - -json_body: 以 json 的格式发送操作所需信息 - -operator:对应 Syncer 的不同操作 - -所以接口返回都是 json, 如果成功则是其中 success 字段为 true, 类似,错误的时候,是 false,然后存在`ErrMsgs`字段 - -```JSON -{"success":true} - -or - -{"success":false,"error_msg":"job ccr_test not exist"} -``` - -### operators - -- create_ccr - -​ 创建 CCR 任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "demo", - "table": "example_tbl" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "ccrt", - "table": "copy" - } - }' http://127.0.0.1:9190/create_ccr - ``` - -- name: CCR 同步任务的名称,唯一即可 - -- host、port:对应集群 master 的 host 和 mysql(jdbc) 的端口 - -- thrift_port:对应 FE 的 rpc_port - -- user、password:syncer 以何种身份去开启事务、拉取数据等 - -- database、table: - - - 如果是 db 级别的同步,则填入 dbName,tableName 为空 - - - 如果是表级别同步,则需要填入 dbName、tableName - -- get_lag - -​ 查看同步进度 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/get_lag - ``` - -​ 其中 job_name 是 create_ccr 时创建的 name - -- pause - -​ 暂停同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/pause - ``` - -- resume - -​ 恢复同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/resume - ``` - -- delete - -​ 删除同步任务 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/delete - ``` - -- version - - 获取版本信息 - - ```shell - curl http://ccr_syncer_host:ccr_syncer_port/version - - # > return - {"version": "2.0.1"} - ``` - -- job status - - 查看 job 的状态 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/job_status - - { - "success": true, - "status": { - "name": "ccr_db_table_alias", - "state": "running", - "progress_state": "TableIncrementalSync" - } - } - ``` - -- desync job - - 不做 sync,此时用户可以将源和目的集群互换 - - ```shell - curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" - }' http://ccr_syncer_host:ccr_syncer_port/desync - ``` - -- list_jobs - - 展示已经创建的所有任务 - - ```shell - curl http://ccr_syncer_host:ccr_syncer_port/list_jobs - - {"success":true,"jobs":["ccr_db_table_alias"]} - ``` - -### 开启库中所有表的 binlog - -**输出路径下的文件结构** - -在编译完成后的输出路径下,文件结构大致如下所示: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # 默认配置下运行后生成 - log - [ccr_syncer.log] # 默认配置下运行后生成 -``` -:::caution -**后文中的 enable_db_binlog.sh 指的是该路径下的 enable_db_binlog.sh!!!** -::: - -**使用说明** - -```shell -bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db -``` - -## Syncer 高可用 - -Syncer 高可用依赖 mysql,如果使用 mysql 作为后端存储,Syncer 可以发现其它 syncer,如果一个 crash 了,其他会分担他的任务 - -### 权限要求 - -1. Select_priv 对数据库、表的只读权限。 - -2. Load_priv 对数据库、表的写权限。包括 Load、Insert、Delete 等。 - -3. Alter_priv 对数据库、表的更改权限。包括重命名 库/表、添加/删除/变更 列、添加/删除 分区等操作。 - -4. Create_priv 创建数据库、表、视图的权限。 - -5. Drop_priv 删除数据库、表、视图的权限。 - -加上 Admin 权限 (之后考虑彻底移除), 这个是用来检测 enable binlog config 的,现在需要 admin - -## 使用限制 - -### 网络约束 - -- 需要 Syncer 与上下游的 FE 和 BE 都是通的 - -- 下游 BE 与上游 BE 是通的 - -- 对外 IP 和 Doris 内部 IP 是一样的,也就是说`show frontends/backends`看到的,和能直接连的 IP 是一致的,要是直连,不能是 IP 转发或者 nat - -### ThriftPool 限制 - -开大 thrift thread pool 大小,最好是超过一次 commit 的 bucket 数目大小 - -### 版本要求 - -版本最低要求:v2.0.3 - -### 不支持的操作 - -- rename table 支持有点问题 - -- 不支持一些 trash 的操作,比如 table 的 drop-recovery 操作 - -- 和 rename table 有关的,replace partition 与 - -- 不能发生在同一个 db 上同时 backup/restore - -## Feature - -### 限速 - -BE 端配置参数: - -```shell -download_binlog_rate_limit_kbs=1024 # 限制单个 BE 节点从源集群拉取 Binlog(包括 Local Snapshot)的速度为 1 MB/s -``` - -详细参数加说明: - -1. `download_binlog_rate_limit_kbs` 参数在源集群 BE 节点配置,通过设置该参数能够有效限制数据拉取速度。 - -2. `download_binlog_rate_limit_kbs` 参数主要用于设置单个 BE 节点的速度,若计算集群整体速率一般需要参数值乘以集群个数。 - -## IS_BEING_SYNCED 属性 - -从 Doris v2.0 "is_being_synced" = "true" - -CCR 功能在建立同步时,会在目标集群中创建源集群同步范围中表(后称源表,位于源集群)的副本表(后称目标表,位于目标集群),但是在创建副本表时需要失效或者擦除一些功能和属性以保证同步过程中的正确性。 - -如: - -- 源表中包含了可能没有被同步到目标集群的信息,如`storage_policy`等,可能会导致目标表创建失败或者行为异常。 - -- 源表中可能包含一些动态功能,如动态分区等,可能导致目标表的行为不受 syncer 控制导致 partition 不一致。 - -在被复制时因失效而需要擦除的属性有: - -- `storage_policy` - -- `colocate_with` - -在被同步时需要失效的功能有: - -- 自动分桶 - -- 动态分区 - -### 实现 - -在创建目标表时,这条属性将会由 syncer 控制添加或者删除,在 CCR 功能中,创建一个目标表有两个途径: - -1. 在表同步时,syncer 通过 backup/restore 的方式对源表进行全量复制来得到目标表。 - -2. 在库同步时,对于存量表而言,syncer 同样通过 backup/restore 的方式来得到目标表,对于增量表而言,syncer 会通过携带有 CreateTableRecord 的 binlog 来创建目标表。 - -综上,对于插入`is_being_synced`属性有两个切入点:全量同步中的 restore 过程和增量同步时的 getDdlStmt。 - -在全量同步的 restore 过程中,syncer 会通过 rpc 发起对原集群中 snapshot 的 restore,在这个过程中为会为 RestoreStmt 添加`is_being_synced`属性,并在最终的 restoreJob 中生效,执行`isBeingSynced`的相关逻辑。在增量同步时的 getDdlStmt 中,为 getDdlStmt 方法添加参数`boolean getDdlForSync`,以区分是否为受控转化为目标表 ddl 的操作,并在创建目标表时执行`isBeingSynced`的相关逻辑。 - -对于失效属性的擦除无需多言,对于上述功能的失效需要进行说明: - -- 自动分桶 自动分桶会在创建表时生效,计算当前合适的 bucket 数量,这就可能导致源表和目的表的 bucket 数目不一致。因此在同步时需要获得源表的 bucket 数目,并且也要获得源表是否为自动分桶表的信息以便结束同步后恢复功能。当前的做法是在获取 distribution 信息时默认 autobucket 为 false,在恢复表时通过检查`_auto_bucket`属性来判断源表是否为自动分桶表,如是则将目标表的 autobucket 字段设置为 true,以此来达到跳过计算 bucket 数量,直接应用源表 bucket 数量的目的。 - -- 动态分区 动态分区则是通过将`olapTable.isBeingSynced()`添加到是否执行 add/drop partition 的判断中来实现的,这样目标表在被同步的过程中就不会周期性的执行 add/drop partition 操作。 - -### 注意 - -在未出现异常时,`is_being_synced`属性应该完全由 syncer 控制开启或关闭,用户不要自行修改该属性。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/overview.md index f4dea57a1ceeb..d4ff7097bff63 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/overview.md @@ -1,6 +1,6 @@ --- { - "title": "业务连续性和数据恢复概览", + "title": "容灾管理概览", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/repairing-data.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/repairing-data.md deleted file mode 100644 index a6da0193360d8..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/repairing-data.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -{ - "title": "数据修复", - "language": "zh-CN" -} ---- - - - - - -对于 Unique Key Merge on Write 表,在某些 Doris 的版本中存在 Bug,可能会导致系统在计算 Delete Bitmap 时出现错误,导致出现重复主键,此时可以利用 Full Compaction 功能进行数据的修复。本功能对于非 Unique Key Merge on Write 表无效。 - -该功能需要 Doris 版本 2.0+。 - -使用该功能,需要尽可能停止导入,否则可能会出现导入超时等问题。 - -## 简要原理说明 - -执行 Full Compaction 后,会对 Delete Bitmap 进行重新计算,将错误的 Delete Bitmap 数据删除,以完成数据的修复。 - -## 使用说明 - -`POST /api/compaction/run?tablet_id={int}&compact_type=full` - -或 - -`POST /api/compaction/run?table_id={int}&compact_type=full` - -注意,`tablet_id` 和 `table_id` 只能指定一个,不能够同时指定,指定 `table_id` 后会自动对此 table 下所有 tablet 执行 `full_compaction`。 - -## 使用例子 - -```shell -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" -curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" -``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/restore.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/restore.md deleted file mode 100644 index c8ad18300080c..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/restore.md +++ /dev/null @@ -1,219 +0,0 @@ ---- -{ - "title": "数据备份恢复", - "language": "zh-CN" -} ---- - - - - - -Doris 支持将当前数据以文件的形式,通过 broker 备份到远端存储系统中。之后可以通过 恢复 命令,从远端存储系统中将数据恢复到任意 Doris 集群。通过这个功能,Doris 可以支持将数据定期的进行快照备份。也可以通过这个功能,在不同集群间进行数据迁移。 - -该功能需要 Doris 版本 0.8.2+ - -使用该功能,需要部署对应远端存储的 broker。如 BOS、HDFS 等。可以通过 `SHOW BROKER;` 查看当前部署的 broker。 - -## 简要原理说明 - -恢复操作需要指定一个远端仓库中已存在的备份,然后将这个备份的内容恢复到本地集群中。当用户提交 Restore 请求后,系统内部会做如下操作: - -1. 在本地创建对应的元数据 - - 这一步首先会在本地集群中,创建恢复对应的表分区等结构。创建完成后,该表可见,但是不可访问。 - -2. 本地 snapshot - - 这一步是将上一步创建的表做一个快照。这其实是一个空快照(因为刚创建的表是没有数据的),其目的主要是在 Backend 上产生对应的快照目录,用于之后接收从远端仓库下载的快照文件。 - -3. 下载快照 - - 远端仓库中的快照文件,会被下载到对应的上一步生成的快照目录中。这一步由各个 Backend 并发完成。 - -4. 生效快照 - - 快照下载完成后,我们要将各个快照映射为当前本地表的元数据。然后重新加载这些快照,使之生效,完成最终的恢复作业。 - -## 开始恢复 - -1. 从 example_repo 中恢复备份 snapshot_1 中的表 backup_tbl 到数据库 example_db1,时间版本为 "2018-05-04-16-45-08"。恢复为 1 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. 从 example_repo 中恢复备份 snapshot_2 中的表 backup_tbl 的分区 p1,p2,以及表 backup_tbl2 到数据库 example_db1,并重命名为 new_tbl,时间版本为 "2018-05-04-17-11-01"。默认恢复为 3 个副本: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. 查看 restore 作业的执行情况: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -RESTORE 的更多用法可参考 [这里](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE.md)。 - -## 相关命令 - -和备份恢复功能相关的命令如下。以下命令,都可以通过 mysql-client 连接 Doris 后,使用 `help cmd;` 的方式查看详细帮助。 - -**1. CREATE REPOSITORY** - -创建一个远端仓库路径,用于备份或恢复。该命令需要借助 Broker 进程访问远端存储,不同的 Broker 需要提供不同的参数,具体请参阅 [Broker 文档](../../data-operate/import/broker-load-manual),也可以直接通过 S3 协议备份到支持 AWS S3 协议的远程存储上去,也可以直接备份到 HDFS,具体参考 [创建远程仓库文档](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY) - -**2. RESTORE** - -执行一次恢复操作。 - -3. SHOW RESTORE - -查看最近一次 restore 作业的执行情况,包括: - -- JobId:本次恢复作业的 id。 - -- Label:用户指定的仓库中备份的名称(Label)。 - -- Timestamp:用户指定的仓库中备份的时间戳。 - -- DbName:恢复作业对应的 Database。 - -- State:恢复作业当前所在阶段: - - - PENDING:作业初始状态。 - - - SNAPSHOTING:正在进行本地新建表的快照操作。 - - - DOWNLOAD:正在发送下载快照任务。 - - - DOWNLOADING:快照正在下载。 - - - COMMIT:准备生效已下载的快照。 - - - COMMITTING:正在生效已下载的快照。 - - - FINISHED:恢复完成。 - - - CANCELLED:恢复失败或被取消。 - -- AllowLoad:恢复期间是否允许导入。 - -- ReplicationNum:恢复指定的副本数。 - -- RestoreObjs:本次恢复涉及的表和分区的清单。 - -- CreateTime:作业创建时间。 - -- MetaPreparedTime:本地元数据生成完成时间。 - -- SnapshotFinishedTime:本地快照完成时间。 - -- DownloadFinishedTime:远端快照下载完成时间。 - -- FinishedTime:本次作业完成时间。 - -- UnfinishedTasks:在 `SNAPSHOTTING`,`DOWNLOADING`, `COMMITTING` 等阶段,会有多个子任务在同时 -进行,这里展示的当前阶段,未完成的子任务的 task id。 - -- TaskErrMsg:如果有子任务执行出错,这里会显示对应子任务的错误信息。 - -- Status:用于记录在整个作业过程中,可能出现的一些状态信息。 - -- Timeout:作业的超时时间,单位是秒。 - -**4. CANCEL RESTORE** - -取消当前正在执行的恢复作业。 - -**5. DROP REPOSITORY** - -删除已创建的远端仓库。删除仓库,仅仅是删除该仓库在 Doris 中的映射,不会删除实际的仓库数据。 - -## 常见错误 - -1. RESTORE 报错:[20181: invalid md5 of downloaded file:/data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - 备份和恢复的表的副本数不一致导致的,执行恢复命令时需指定副本个数,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册 - -2. RESTORE 报错:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - 备份和恢复不是同一个版本导致的,使用指定的 meta_version 来读取之前备份的元数据。注意,该参数作为临时方案,仅用于恢复老版本 Doris 备份的数据。最新版本的备份数据中已经包含 meta version,无需再指定,针对上述错误具体解决方案指定 meta_version = 100,具体命令请参阅[RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册 - -## 更多帮助 - -关于 RESTORE 使用的更多详细语法及最佳实践,请参阅 [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) 命令手册,你也可以在 MySql 客户端命令行下输入 `HELP RESTORE` 获取更多帮助信息。 - diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md index c5e7982ebf6d4..973707dc785dc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md @@ -170,6 +170,6 @@ storage_flood_stage_left_capacity_bytes 默认 1GB。 `rm -rf data/0/12345/` - - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](./tablet-meta-tool.md)) + - 删除 Tablet 元数据(具体参考 [Tablet 元数据管理工具](../trouble-shooting/tablet-meta-tool.md)) `./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md index fd68c058b5c72..78fcfefd7fd3d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md @@ -49,7 +49,7 @@ under the License. * `compact_type` - - 取值为`base`或`cumulative`或`full`。full_compaction 的使用场景请参考[数据恢复](../../data-admin/repairing-data)。 + - 取值为`base`或`cumulative`或`full`。full_compaction 的使用场景请参考[数据恢复](../../trouble-shooting/repairing-data)。 ## Request body diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/job-scheduler.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/job-scheduler.md index 29f2342a32084..eed867f416508 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/job-scheduler.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/job-scheduler.md @@ -1,6 +1,6 @@ --- { -"title": "作业调度", +"title": "调度管理", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/kill-query.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/kill-query.md index 773d6173605a9..6a10f9ebb5f93 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/kill-query.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/kill-query.md @@ -1,6 +1,6 @@ --- { -"title": "Kill Query", +"title": "终止查询", "language": "zh-CN" } --- diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/workload-management-summary.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/workload-management-summary.md index 97dacb28cf810..14eb866886725 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/workload-management-summary.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/workload-management/workload-management-summary.md @@ -1,6 +1,6 @@ --- { -"title": "概述", +"title": "负载管理概述", "language": "zh-CN" } --- @@ -24,21 +24,21 @@ specific language governing permissions and limitations under the License. --> -负载管理是Doris一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: +负载管理是 Doris 一项非常重要的功能,在整个系统运行中起着非常重要的作用。通过合理的负载管理策略,可以优化资源使用,提高系统的稳定性,降低响应时间。它具备以下功能: -- 资源隔离: 通过划分多个Group,并且为每个Group都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(例如读写操作)之间互不干扰; +- 资源隔离:通过划分多个 Group,并且为每个 Group 都设置一定的资源(CPU, Memory, IO)限制,确保多个用户之间、同一用户不同的任务(例如读写操作)之间互不干扰; -- 并发控制与排队: 可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; +- 并发控制与排队:可以限制整个集群同时执行的任务数量,当超过设置的阈值时自动排队; -- 熔断: 在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 +- 熔断:在查询的规划阶段或者执行过程中,可以根据预估的或者执行中需要读取的分区数量,扫描的数据量,分配的内存大小,执行时间等条件,自动取消任务,避免不合理的任务占用太多的系统资源。 ## 资源划分方式 -Doris 可以通过以下3种方式将资源分组: +Doris 可以通过以下 3 种方式将资源分组: - Resource Group: 以 BE 节点为最小粒度,通过设置标签(tag)的方式,划分出多个资源组; -- Workload Group: 将一个BE内的资源(CPU、Memory、IO)通过Cgroup划分出多个资源组,实现更细致的资源分配; +- Workload Group: 将一个 BE 内的资源(CPU、Memory、IO)通过 Cgroup 划分出多个资源组,实现更细致的资源分配; - Compute Group: 是存算分离模式下的一种资源组划分的方式,与 Resource Group 类似,它也是以 BE 节点为最小粒度,划分出多个资源组。 @@ -46,9 +46,9 @@ Doris 可以通过以下3种方式将资源分组: | 资源隔离方式 | 隔离粒度 | 软/硬限制 | 跨资源组查询 | | ---------- | ----------- |-----|-----| -| Resource Group | 服务器节点级别,资源完全隔离;可以隔离BE故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | -| Workload Group | BE 进程内隔离;不能隔离BE故障 | 支持硬限制与软限制 | 支持跨资源组查询 | -|Compute Group | 服务器节点级别,资源完全隔离;可以隔离BE故障 | 硬限制 | 不支持跨资源组查询 | +| Resource Group | 服务器节点级别,资源完全隔离;可以隔离 BE 故障 | 硬限制 |不支持跨资源组查询,必须保证资源组内至少存储一副本数据。 | +| Workload Group | BE 进程内隔离;不能隔离 BE 故障 | 支持硬限制与软限制 | 支持跨资源组查询 | +|Compute Group | 服务器节点级别,资源完全隔离;可以隔离 BE 故障 | 硬限制 | 不支持跨资源组查询 | ## 软限与硬限 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/install-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/install-faq.md index b03ebfba2c95b..9e538de642dd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/install-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/install-faq.md @@ -267,7 +267,7 @@ http { 2. 集群内多数 Follower FE 节点未启动。比如有 3 个 Follower,只启动了一个。此时需要将另外至少一个 FE 也启动,FE 可选举组方能选举出 Master 已提供服务。 -如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md)进行恢复。 +如果以上情况都不能解决,可以按照 Doris 官网文档中的[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md)进行恢复。 ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -277,7 +277,7 @@ http { 有时重启 FE,会出现如上错误(通常只会出现在多 Follower 的情况下)。并且错误中的两个数值相差 2。导致 FE 启动失败。 -这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/maint-monitor/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 +这是 bdbje 的一个 bug,尚未解决。遇到这种情况,只能通过[元数据运维文档](../admin-manual/trouble-shooting/metadata-operation.md) 中的 故障恢复 进行操作来恢复元数据了。 ### Q12. Doris 编译安装 JDK 版本不兼容问题 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/releasenotes/v2.1/release-2.1.6.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/releasenotes/v2.1/release-2.1.6.md index 65853079ee177..be8b1f1dfd7e3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/releasenotes/v2.1/release-2.1.6.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/releasenotes/v2.1/release-2.1.6.md @@ -111,7 +111,7 @@ under the License. - 更多信息,请查看文档 [table_properties](../../admin-manual/system-tables/information_schema/table_properties/) - 新增 FE 中死锁和慢锁检测功能。 - - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/maint-monitor/frontend-lock-manager/) + - 更多信息,请查看文档 [FE 锁管理](../../admin-manual/trouble-shooting/frontend-lock-manager) ## 改进提升 diff --git a/sidebars.json b/sidebars.json index 8c0c46d2e2a14..3005645135705 100644 --- a/sidebars.json +++ b/sidebars.json @@ -497,35 +497,6 @@ "admin-manual/cluster-management/fqdn" ] }, - { - "type": "category", - "label": "Business Continuity & Data Recovery", - "items": [ - "admin-manual/data-admin/overview", - { - "type": "category", - "label": "Backup & Restore", - "items": [ - "admin-manual/data-admin/backup-restore/overview", - "admin-manual/data-admin/backup-restore/backup", - "admin-manual/data-admin/backup-restore/restore" - ] - }, - { - "type": "category", - "label": "Cross Cluster Replication", - "items": [ - "admin-manual/data-admin/ccr/overview", - "admin-manual/data-admin/ccr/quickstart", - "admin-manual/data-admin/ccr/manual", - "admin-manual/data-admin/ccr/feature", - "admin-manual/data-admin/ccr/config", - "admin-manual/data-admin/ccr/performance" - ] - }, - "admin-manual/data-admin/recyclebin" - ] - }, { "type": "category", "label": "Managing Workload", @@ -540,6 +511,7 @@ "admin-manual/workload-management/workload-group" ] }, + "admin-manual/workload-management/analysis-diagnosis", "admin-manual/workload-management/concurrency-control-and-queuing", "admin-manual/workload-management/sql-blocking", "admin-manual/workload-management/kill-query", @@ -548,35 +520,31 @@ }, { "type": "category", - "label": "Managing Memory", + "label": "Managing Disater Recovery", "items": [ - "admin-manual/memory-management/overview", - "admin-manual/memory-management/memory-issue-faq", + "admin-manual/data-admin/overview", { "type": "category", - "label": "Managing Memory Analysis", + "label": "Backup & Restore", "items": [ - "admin-manual/memory-management/memory-analysis/jemalloc-memory-analysis", - "admin-manual/memory-management/memory-analysis/global-memory-analysis", - "admin-manual/memory-management/memory-analysis/doris-cache-memory-analysis", - "admin-manual/memory-management/memory-analysis/metadata-memory-analysis", - "admin-manual/memory-management/memory-analysis/query-memory-analysis", - "admin-manual/memory-management/memory-analysis/load-memory-analysis", - "admin-manual/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded", - "admin-manual/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded", - "admin-manual/memory-management/memory-analysis/oom-crash-analysis", - "admin-manual/memory-management/memory-analysis/memory-log-analysis", - "admin-manual/memory-management/memory-analysis/heap-profile-memory-analysis" + "admin-manual/data-admin/backup-restore/overview", + "admin-manual/data-admin/backup-restore/backup", + "admin-manual/data-admin/backup-restore/restore" ] }, { "type": "category", - "label": "Managing Memory Feature", + "label": "Cross Cluster Replication", "items": [ - "admin-manual/memory-management/memory-feature/memory-tracker", - "admin-manual/memory-management/memory-feature/memory-control-strategy" + "admin-manual/data-admin/ccr/overview", + "admin-manual/data-admin/ccr/quickstart", + "admin-manual/data-admin/ccr/manual", + "admin-manual/data-admin/ccr/feature", + "admin-manual/data-admin/ccr/config", + "admin-manual/data-admin/ccr/performance" ] - } + }, + "admin-manual/data-admin/recyclebin" ] }, { @@ -591,21 +559,11 @@ "type": "category", "label": "Maintenance", "items": [ - { - "type": "category", - "label": "Monitor", - "items": [ - "admin-manual/maint-monitor/monitor-metrics/metrics", - "admin-manual/maint-monitor/monitor-alert" - ] - }, + "admin-manual/maint-monitor/metrics", + "admin-manual/maint-monitor/monitor-alert", "admin-manual/maint-monitor/disk-capacity", "admin-manual/maint-monitor/tablet-repair-and-balance", - "admin-manual/maint-monitor/tablet-meta-tool", - "admin-manual/maint-monitor/tablet-local-debug", - "admin-manual/maint-monitor/metadata-operation", - "admin-manual/maint-monitor/automatic-service-start", - "admin-manual/maint-monitor/frontend-lock-manager" + "admin-manual/maint-monitor/automatic-service-start" ] }, { @@ -686,7 +644,6 @@ } ] }, - "admin-manual/audit-plugin", { "type": "category", "label": "OPEN API", @@ -778,9 +735,53 @@ } ] }, + { + "type": "category", + "label": "Trouble Shooting", + "items": [ + { + "type": "category", + "label": "Managing Memory", + "items": [ + "admin-manual/trouble-shooting/memory-management/overview", + "admin-manual/trouble-shooting/memory-management/memory-issue-faq", + { + "type": "category", + "label": "Managing Memory Analysis", + "items": [ + "admin-manual/trouble-shooting/memory-management/memory-analysis/jemalloc-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/global-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/metadata-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/query-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/load-memory-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-process-memory-exceeded", + "admin-manual/trouble-shooting/memory-management/memory-analysis/query-cancelled-after-query-memory-exceeded", + "admin-manual/trouble-shooting/memory-management/memory-analysis/oom-crash-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/memory-log-analysis", + "admin-manual/trouble-shooting/memory-management/memory-analysis/heap-profile-memory-analysis" + ] + }, + { + "type": "category", + "label": "Managing Memory Feature", + "items": [ + "admin-manual/trouble-shooting/memory-management/memory-feature/memory-tracker", + "admin-manual/trouble-shooting/memory-management/memory-feature/memory-control-strategy" + ] + } + ] + }, + "admin-manual/trouble-shooting/compaction", + "admin-manual/trouble-shooting/metadata-operation", + "admin-manual/trouble-shooting/frontend-lock-manager", + "admin-manual/trouble-shooting/tablet-local-debug", + "admin-manual/trouble-shooting/tablet-meta-tool", + "admin-manual/trouble-shooting/repairing-data" + ] + }, "admin-manual/plugin-development-manual", - "admin-manual/small-file-mgr", - "admin-manual/compaction" + "admin-manual/small-file-mgr" ] }, { diff --git a/versioned_docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md b/versioned_docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md index 59412f846b2be..ed048080ebcfd 100644 --- a/versioned_docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md +++ b/versioned_docs/version-1.2/admin-manual/maint-monitor/disk-capacity.md @@ -162,6 +162,6 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o ```rm -rf data/0/12345/``` - * Delete tablet metadata (refer to [Tablet metadata management tool](./tablet-meta-tool.md)) + * Delete tablet metadata (refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md)) ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/versioned_docs/version-1.2/faq/install-faq.md b/versioned_docs/version-1.2/faq/install-faq.md index a9e0b69c6e5e9..456135b811fb7 100644 --- a/versioned_docs/version-1.2/faq/install-faq.md +++ b/versioned_docs/version-1.2/faq/install-faq.md @@ -253,7 +253,7 @@ There are usually two reasons for this problem: 1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because `priority_network` is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying `priority_network`. 2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services. -If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document. +If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/trouble-shooting/metadata-operation.md) in the Doris official website document. ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -263,7 +263,7 @@ If the following problems occur when using MySQL client to connect to Doris, thi Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two values in the error differ by 2. Causes FE to fail to start. -This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/maint-monitor/metadata-operation.md). +This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/trouble-shooting/metadata-operation.md). ### Q12. Doris compile and install JDK version incompatibility problem diff --git a/versioned_docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md b/versioned_docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md index 86bfe6abc5db8..86f544b2423fc 100644 --- a/versioned_docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md +++ b/versioned_docs/version-2.0/admin-manual/maint-monitor/disk-capacity.md @@ -162,6 +162,6 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o ```rm -rf data/0/12345/``` - * Delete tablet metadata refer to [Tablet metadata management tool](./tablet-meta-tool.md) + * Delete tablet metadata refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md) ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/versioned_docs/version-2.0/faq/install-faq.md b/versioned_docs/version-2.0/faq/install-faq.md index a9e0b69c6e5e9..456135b811fb7 100644 --- a/versioned_docs/version-2.0/faq/install-faq.md +++ b/versioned_docs/version-2.0/faq/install-faq.md @@ -253,7 +253,7 @@ There are usually two reasons for this problem: 1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because `priority_network` is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying `priority_network`. 2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services. -If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document. +If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/trouble-shooting/metadata-operation.md) in the Doris official website document. ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -263,7 +263,7 @@ If the following problems occur when using MySQL client to connect to Doris, thi Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two values in the error differ by 2. Causes FE to fail to start. -This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/maint-monitor/metadata-operation.md). +This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/trouble-shooting/metadata-operation.md). ### Q12. Doris compile and install JDK version incompatibility problem diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/backup.md b/versioned_docs/version-2.1/admin-manual/data-admin/backup.md deleted file mode 100644 index ff867f0b9702f..0000000000000 --- a/versioned_docs/version-2.1/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -{ - "title": "Data Backup", - "language": "en" -} ---- - - - -# Data Backup - -Doris supports backing up the current data in the form of files to the remote storage system like S3 and HDFS. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -## A brief explanation of the principle - -The backup operation is to upload the data of the specified table or partition directly to the remote warehouse for storage in the form of files stored by Doris. When a user submits a Backup request, the system will perform the following operations: - -1. Snapshot and snapshot upload - - The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are all operations on snapshots. After the snapshot, changes, imports, etc. to the table no longer affect the results of the backup. Snapshots only generate a hard link to the current data file, which takes very little time. After the snapshot is completed, the snapshot files will be uploaded one by one. Snapshot uploads are done concurrently by each Backend. - -2. Metadata preparation and upload - - After the data file snapshot upload is complete, Frontend will first write the corresponding metadata to a local file, and then upload the local metadata file to the remote warehouse through the broker. Completing the final backup job - -3. Dynamic Partition Table Description - - If the table is a dynamic partition table, the dynamic partition attribute will be automatically disabled after backup. When restoring, you need to manually enable the dynamic partition attribute of the table. The command is as follows: - -```sql -ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") -``` - -4. Backup and Restore operation will NOT keep the `colocate_with` property of a table. - -## Start Backup - -1. Create a hdfs remote warehouse example_repo (S3 skips step 1): - - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - -2. Create a remote repository for s3 : s3_repo (HDFS skips step 2) - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY" = "xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >Note that. - > - >ON LOCATION is followed by Bucket Name here - -3. Full backup of table example_tbl under example_db to warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -4. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -5. View the execution of the most recent backup job: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -6. View existing backups in remote repositories: - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -For the detailed usage of BACKUP, please refer to [here](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md). - -## Best Practices - -### Backup - -Currently, we support full backup with the smallest partition (Partition) granularity (incremental backup may be supported in future versions). If you need to back up data regularly, you first need to plan the partitioning and bucketing of the table reasonably when building the table, such as partitioning by time. Then, in the subsequent running process, regular data backups are performed according to the partition granularity. - -### Data Migration - -Users can back up the data to the remote warehouse first, and then restore the data to another cluster through the remote warehouse to complete the data migration. Because data backup is done in the form of snapshots, new imported data after the snapshot phase of the backup job will not be backed up. Therefore, after the snapshot is completed and until the recovery job is completed, the data imported on the original cluster needs to be imported again on the new cluster. - -It is recommended to import the new and old clusters in parallel for a period of time after the migration is complete. After verifying the correctness of data and services, migrate services to a new cluster. - -## Highlights - -1. Operations related to backup and recovery are currently only allowed to be performed by users with ADMIN privileges. -2. Within a database, only one backup or restore job is allowed to be executed. -3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations by partition to reduce the cost of failed retry. -4. Because of the backup and restore operations, the operations are the actual data files. Therefore, when a table has too many shards, or a shard has too many small versions, it may take a long time to backup or restore even if the total amount of data is small. Users can use `SHOW PARTITIONS FROM table_name;` and `SHOW TABLETS FROM table_name;` to view the number of shards in each partition and the number of file versions in each shard to estimate job execution time. The number of files has a great impact on the execution time of the job. Therefore, it is recommended to plan partitions and buckets reasonably when creating tables to avoid excessive sharding. -5. When checking job status via `SHOW BACKUP` or `SHOW RESTORE` command. It is possible to see error messages in the `TaskErrMsg` column. But as long as the `State` column is not `CANCELLED`, the job is still continuing. These tasks may retry successfully. Of course, some Task errors will also directly cause the job to fail. -6. If the recovery job is an overwrite operation (specifying the recovery data to an existing table or partition), then from the `COMMIT` phase of the recovery job, the overwritten data on the current cluster may no longer be restored. If the restore job fails or is canceled at this time, the previous data may be damaged and inaccessible. In this case, the only way to do it is to perform the recovery operation again and wait for the job to complete. Therefore, we recommend that if unnecessary, try not to restore data by overwriting unless it is confirmed that the current data is no longer used. - -## Related Commands - -1. The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - - 1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. Please refer to [Create Repository Reference](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md). - - 2. BACKUP - - Perform a backup operation. Please refer to [Backup Reference](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md). - - 3. SHOW BACKUP - - View the execution of the most recent backup job. Please refer to [Show Backup Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-BACKUP.md)。 - - 4. SHOW SNAPSHOT - - View existing backups in the remote repository. Please refer to [Show Snapshot Reference](../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/SHOW-SNAPSHOT.md). - - 5. CANCEL BACKUP - - Cancel the currently executing backup job. Please refer to [Cancel Backup Reference] (../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md). - - 6. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. Please refer to [Drop Repository Reference] (../../sql-manual/sql-statements/data-modification/backup-and-restore/DROP-REPOSITORY.md). - -## More Help - - For more detailed syntax and best practices used by BACKUP, please refer to the [BACKUP](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md) command manual, You can also type `HELP BACKUP` on the MySql client command line for more help. diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/ccr.md b/versioned_docs/version-2.1/admin-manual/data-admin/ccr.md deleted file mode 100644 index 7fc88db506550..0000000000000 --- a/versioned_docs/version-2.1/admin-manual/data-admin/ccr.md +++ /dev/null @@ -1,608 +0,0 @@ ---- -{ - "title": "CCR (Cross Cluster Replication)", - "language": "en" -} ---- - - - -# Cross Cluster Replication (CCR) -## Overview - -Cross Cluster Replication (CCR) enables the synchronization of data changes from a source cluster to a target cluster at the database/table level. This feature can be used to ensure data availability for online services, isolate offline and online workloads, and build multiple data centers across various sites. - -CCR is applicable to the following scenarios: - -- Disaster recovery: This involves backing up enterprise data to another cluster and data center. In the event of a sudden incident causing business interruption or data loss, companies can recover data from the backup or quickly switch to the backup cluster. Disaster recovery is typically a must-have feature in use cases with high SLA requirements, such as those in finance, healthcare, and e-commerce. -- Read/write separation: This is to isolate querying and writing operations to reduce their mutual impact and improve resource utilization. For example, in cases of high writing pressure or high concurrency, read/write separation can distribute read and write operations to read-only and write-only database instances in various regions. This helps ensure high database performance and stability. -- Data transfer between headquarters and branch offices: In order to have unified data control and analysis within a corporation, the headquarters usually requires timely data synchronization from branch offices located in different regions. This avoids management confusion and wrong decision-making based on inconsistent data. -- Isolated upgrades: During system cluster upgrades, there might be a need to roll back to a previous version. Many traditional upgrade methods do not allow rolling back due to incompatible metadata. CCR in Doris can address this issue by building a standby cluster for upgrade and conducting dual-running verification. Users can ungrade the clusters one by one. CCR is not dependent on specific versions, making version rollback feasible. - -## Design - -### Concepts - -- Source cluster: the cluster where business data is written and originates from, requiring Doris version 2.0 - -- Target cluster: the destination cluster for cross cluster replication, requiring version 2.0 - -- Binlog: the change log of the source cluster, including schema and data changes - -- Syncer: a lightweight process - -### Architecture description - -![ccr-architecture-description](/images/ccr-architecture-description.png) - -CCR relies on a lightweight process called syncer. Syncers retrieve binlogs from the source cluster, directly apply the metadata to the target cluster, and notify the target cluster to pull data from the source cluster. CCR allows both full and incremental data migration. - -### Usage - -The usage of CCR is straightforward. Simply start the syncer service and send a command, and the syncers will take care of the rest. - -1. Deploy the source Doris cluster. -2. Deploy the target Doris cluster. -3. Both the source and target clusters need to enable binlog. Configure the following information in the fe.conf and be.conf files of the source and target clusters: - -```SQL -enable_feature_binlog=true -``` - -4. Deploy syncers - -​Build CCR syncer - -```shell -git clone https://github.com/selectdb/ccr-syncer -cd ccr-syncer -bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR> -cd SYNCER_OUTPUT_DIR# Contact the Doris community for a free CCR binary package -``` - - -Start and stop syncer - - -```shell -# Start -cd bin && sh start_syncer.sh --daemon - -# Stop -sh stop_syncer.sh -``` - -5. Enable binlog in the source cluster. - -```shell --- If you want to synchronize the entire database, you can execute the following script: -vim shell/enable_db_binlog.sh -Modify host, port, user, password, and db in the source cluster -Or ./enable_db_binlog.sh --host $host --port $port --user $user --password $password --db $db - --- If you want to synchronize a single table, you can execute the following script and enable binlog for the target table: -ALTER TABLE enable_binlog SET ("binlog.enable" = "true"); -``` - -6. Launch a synchronization task to the syncer - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - } -}' http://127.0.0.1:9190/create_ccr -``` - -Parameter description: - -```shell -name: name of the CCR synchronization task, should be unique -host, port: host and mysql(jdbc) port for the master FE for the corresponding cluster -user, password: the credentials used by the syncer to initiate transactions, fetch data, etc. -If it is synchronization at the database level, specify your_db_name and leave your_table_name empty -If it is synchronization at the table level, specify both your_db_name and your_table_name -The synchronization task name can only be used once. -``` - -## Operation manual for syncer - -### Start syncer - -Start syncer according to the configurations and save a pid file in the default or specified path. The name of the pid file should follow `host_port.pid`. - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```SQL -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Start options** - -**--daemon** - -Run syncer in the background, set to false by default. - -```SQL -bash bin/start_syncer.sh --daemon -``` - -**--db_type** - -Syncer can currently use two databases to store its metadata, `sqlite3 `(for local storage) and `mysql `(for local or remote storage). - -```SQL -bash bin/start_syncer.sh --db_type mysql -``` - -The default value is sqlite3. - -When using MySQL to store metadata, syncer will use `CREATE IF NOT EXISTS `to create a database called `ccr`, where the metadata table related to CCR will be saved. - -**--db_dir** - -**This option only works when db uses** **`sqlite3`****.** - -It allows you to specify the name and path of the db file generated by sqlite3. - -```SQL -bash bin/start_syncer.sh --db_dir /path/to/ccr.db -``` - -The default path is `SYNCER_OUTPUT_DIR/db` and the default file name is `ccr.db`. - -**--db_host & db_port & db_user & db_password** - -**This option only works when db uses** **`mysql`****.** - -```SQL -bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456" -``` - -The default values of db_host and db_port are shown in the example. The default values of db_user and db_password are empty. - -**--log_dir** - -Output path of the logs: - -```SQL -bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log -``` - -The default path is`SYNCER_OUTPUT_DIR/log` and the default file name is `ccr_syncer.log`. - -**--log_level** - -Used to specify the output level of syncer logs. - -```SQL -bash bin/start_syncer.sh --log_level info -``` - -The format of the log is as follows, where the hook will only be printed when `log_level > info `: - -```SQL -# time level msg hooks -[2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] INFO This is info type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] WARN This is warn type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx -``` - -Under --daemon, the default value of log_level is `info`. - -When running in the foreground, log_level defaults to `trace`, and logs are saved to log_dir using the tee command. - -**--host && --port** - -Used to specify the host and port of syncer, where host only plays the role of distinguishing itself in the cluster, which can be understood as the name of syncer, and the name of syncer in the cluster is `host: port`. - -```SQL -bash bin/start_syncer.sh --host 127.0.0.1 --port 9190 -``` - -The default value of host is 127.0.0.1, and the default value of port is 9190. - -**--pid_dir** - -Used to specify the storage path of the pid file - -The pid file is the credentials for closing the syncer. It is used in the stop_syncer.sh script. It saves the corresponding syncer process number. In order to facilitate management of syncer, you can specify the storage path of the pid file. - -```SQL -bash bin/start_syncer.sh --pid_dir /path/to/pids -``` - -The default value is `SYNCER_OUTPUT_DIR/bin`. - -### Stop syncer - -Stop the syncer according to the process number in the pid file under the default or specified path. The name of the pid file should follow `host_port.pid`. - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Stop options** - -Syncers can be stopped in three ways: - -1. Stop a single syncer in the directory - -Specify the host and port of the syncer to be stopped. Be sure to keep it consistent with the host specified when start_syncer - -2. Batch stop the specified syncers in the directory - -Specify the names of the pid files to be stopped, wrap the names in `""` and separate them with spaces. - -3. Stop all syncers in the directory - -Follow the default configurations. - -**--pid_dir** - -Specify the directory where the pid file is located. The above three stopping methods all depend on the directory where the pid file is located for execution. - -```shell -bash bin/stop_syncer.sh --pid_dir /path/to/pids -``` - -The effect of the above example is to close the syncers corresponding to all pid files under `/path/to/pids `( **method 3** ). `-- pid_dir `can be used in combination with the above three syncer stopping methods. - -The default value is `SYNCER_OUTPUT_DIR/bin`. - -**--host && --port** - -Stop the syncer corresponding to host: port in the pid_dir path. - -```shell -bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190 -``` - -The default value of host is 127.0.0.1, and the default value of port is empty. That is, specifying the host alone will degrade **method 1** to **method 3**. **Method 1** will only take effect when neither the host nor the port is empty. - -**--files** - -Stop the syncer corresponding to the specified pid file name in the pid_dir path. - -```shell -bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid" -``` - -The file names should be wrapped in `" "` and separated with spaces. - -### Syncer operations - -**Template for requests** - -```shell -curl -X POST -H "Content-Type: application/json" -d {json_body} http://ccr_syncer_host:ccr_syncer_port/operator -``` - -json_body: send operation information in JSON format - -operator: different operations for syncer - -The interface returns JSON. If successful, the "success" field will be true. Conversely, if there is an error, it will be false, and then there will be an `ErrMsgs` field. - -```JSON -{"success":true} - -or - -{"success":false,"error_msg":"job ccr_test not exist"} -``` - -**Operators** - -- create_ccr - -Create CCR tasks - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "demo", - "table": "example_tbl" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "ccrt", - "table": "copy" - } -}' http://127.0.0.1:9190/create_ccr -``` - -- name: the name of the CCR synchronization task, should be unique -- host, port: correspond to the host and mysql (jdbc) port of the cluster's master -- thrift_port: corresponds to the rpc_port of the FE -- user, password: the credentials used by the syncer to initiate transactions, fetch data, etc. -- database, table: - - If it is a database-level synchronization, fill in the database name and leave the table name empty. - - If it is a table-level synchronization, specify both the database name and the table name. - -- get_lag - -View the synchronization progress. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/get_lag -``` - -The job_name is the name specified when create_ccr. - -- pause - -Pause synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/pause -``` - -- resume - -Resume synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/resume -``` - -- delete - -Delete synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/delete -``` - -- version - -View version information. - -```shell -curl http://ccr_syncer_host:ccr_syncer_port/version - -# > return -{"version": "2.0.1"} -``` - -- job status - -View job status. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/job_status - -{ - "success": true, - "status": { - "name": "ccr_db_table_alias", - "state": "running", - "progress_state": "TableIncrementalSync" - } -} -``` - -- desync job - -No sync. Users can swap the source and target clusters. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/desync -``` - -- list_jobs - -List all created tasks. - -```shell -curl http://ccr_syncer_host:ccr_syncer_port/list_jobs - -{"success":true,"jobs":["ccr_db_table_alias"]} -``` - -### Open binlog for all tables in the database - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Usage** - -```shell -bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db -``` - -## High availability of syncer - -The high availability of syncers relies on MySQL. If MySQL is used as the backend storage, the syncer can discover other syncers. If one syncer crashes, the others will take over its tasks. - -### Privilege requirements - -1. `select_priv`: read-only privileges for databases and tables -2. `load_priv`: write privileges for databases and tables, including load, insert, delete, etc. -3. `alter_priv`: privilege to modify databases and tables, including renaming databases/tables, adding/deleting/changing columns, adding/deleting partitions, etc. -4. `create_priv`: privilege to create databases, tables, and views -5. `drop_priv`: privilege to drop databases, tables, and views - -Admin privileges are required (We are planning on removing this in future versions). This is used to check the `enable binlog config`. - -## Usage restrictions - -### Network constraints - -- Syncer needs to have connectivity to both the upstream and downstream FEs and BEs. -- The downstream BE should have connectivity to the upstream BE. -- The external IP and Doris internal IP should be the same. In other words, the IP address visible in the output of `show frontends/backends` should be the same IP that can be directly connected to. It should not involve IP forwarding or NAT for direct connections. - -### ThriftPool constraints - -It is recommended to increase the size of the Thrift thread pool to a number greater than the number of buckets involved in a single commit operation. - -### Version requirements - -Minimum required version: V2.0.3 - -### Unsupported operations - -- Rename table -- Operations such as table drop-recovery -- Operations related to rename table, replace partition -- Concurrent backup/restore within the same database - -## Feature - -### Rate limit - -BE-side configuration parameter - -```shell -download_binlog_rate_limit_kbs=1024 # Limits the download speed of Binlog (including Local Snapshot) from the source cluster to 1 MB/s in a single BE node -``` - -1. The `download_binlog_rate_limit_kbs` parameter is configured on the BE nodes of the source cluster. By setting this parameter, the data pull rate can be effectively limited. - -2. The `download_binlog_rate_limit_kbs` parameter primarily controls the speed of data transfer for each single BE node. To calculate the overall cluster rate, one would multiply the parameter value by the number of nodes in the cluster. - -## IS_BEING_SYNCED - -:::tip -Doris v2.0 "is_being_synced" = "true" -::: - -During data synchronization using CCR, replica tables (referred to as target tables) are created in the target cluster for the tables within the synchronization scope of the source cluster (referred to as source tables). However, certain functionalities and attributes need to be disabled or cleared when creating replica tables to ensure the correctness of the synchronization process. For example: - -- The source tables may contain information that is not synchronized to the target cluster, such as `storage_policy`, which may cause the creation of the target table to fail or result in abnormal behavior. -- The source tables may have dynamic functionalities, such as dynamic partitioning, which can lead to uncontrolled behavior in the target table and result in inconsistent partitions. - -The attributes that need to be cleared during replication are: - -- `storage_policy` -- `colocate_with` - -The functionalities that need to be disabled during synchronization are: - -- Automatic bucketing -- Dynamic partitioning - -### Implementation - -When creating the target table, the syncer controls the addition or deletion of the `is_being_synced` property. In CCR, there are two approaches to creating a target table: - -1. During table synchronization, the syncer performs a full copy of the source table using backup/restore to obtain the target table. -2. During database synchronization, for existing tables, the syncer also uses backup/restore to obtain the target table. For incremental tables, the syncer creates the target table using the CreateTableRecord binlog. - -Therefore, there are two entry points for inserting the `is_being_synced` property: the restore process during full synchronization and the getDdlStmt during incremental synchronization. - -During the restoration process of full synchronization, the syncer initiates a restore operation of the snapshot from the source cluster via RPC. During this process, the `is_being_synced` property is added to the RestoreStmt and takes effect in the final restoreJob, executing the relevant logic for `is_being_synced`. - -During incremental synchronization, add the `boolean getDdlForSync` parameter to the getDdlStmt method to differentiate whether it is a controlled transformation to the target table DDL, and execute the relevant logic for isBeingSynced during the creation of the target table. - -Regarding the disabling of the functionalities mentioned above: - -- Automatic bucketing: Automatic bucketing is enabled when creating a table. It calculates the appropriate number of buckets. This may result in a mismatch in the number of buckets between the source and target tables. Therefore, during synchronization, obtain the number of buckets from the source table, as well as the information about whether the source table is an automatic bucketing table in order to restore the functionality after synchronization. The current recommended approach is to default the autobucket attribute to false when retrieving distribution information. During table restoration, check the `_auto_bucket` attribute to find out if the source table is an automatic bucketing table. If it is, set the target table's autobucket field to true to bypass the calculation of bucket numbers and directly apply the number of buckets from the source table to the target table. -- Dynamic partitioning: This is implemented by adding `olapTable.isBeingSynced()` to the condition for executing add/drop partition operations. This ensures that the target table does not perform periodic add/drop partition operations during synchronization. - -### Note - -The `is_being_synced` property should be fully controlled by the syncer, and users should not modify this property manually unless there are exceptional circumstances. diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/data-recovery.md b/versioned_docs/version-2.1/admin-manual/data-admin/data-recovery.md deleted file mode 100644 index c4439b1bd55d8..0000000000000 --- a/versioned_docs/version-2.1/admin-manual/data-admin/data-recovery.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -{ - "title": "Data Recovery", - "language": "en" -} ---- - - - -# Data Recovery - -For the Unique Key Merge on Write table, there are bugs in some Doris versions, which may cause errors when the system calculates the delete bitmap, resulting in duplicate primary keys. At this time, the full compaction function can be used to repair the data. This function is invalid for non-Unique Key Merge on Write tables. - -This feature requires Doris version 2.0+. - -To use this function, it is necessary to stop the import as much as possible, otherwise problems such as import timeout may occur. - -## Brief principle explanation - -After the full compaction is executed, the delete bitmap will be recalculated, and the wrong delete bitmap data will be deleted to complete the data restoration. - -## Instructions for use - -`POST /api/compaction/run?tablet_id={int}&compact_type=full` - -or - -`POST /api/compaction/run?table_id={int}&compact_type=full` - -Note that only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. - -## Example of use - -``` -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" -curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" -``` \ No newline at end of file diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/restore.md b/versioned_docs/version-2.1/admin-manual/data-admin/restore.md deleted file mode 100644 index 3a5ff88f1fdd5..0000000000000 --- a/versioned_docs/version-2.1/admin-manual/data-admin/restore.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -{ - "title": "Data Restore", - "language": "en" -} ---- - - - -# Data Recovery - -Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can view the currently deployed broker through `SHOW BROKER;`. - -## Brief principle description - -The restore operation needs to specify an existing backup in the remote warehouse, and then restore the content of the backup to the local cluster. When the user submits the Restore request, the system will perform the following operations: - -1. Create the corresponding metadata locally - - This step will first create and restore the corresponding table partition and other structures in the local cluster. After creation, the table is visible, but not accessible. - -2. Local snapshot - - This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the table just created has no data), and its purpose is to generate the corresponding snapshot directory on the Backend for later receiving the snapshot file downloaded from the remote warehouse. - -3. Download snapshot - - The snapshot files in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is done concurrently by each Backend. - -4. Effective snapshot - - After the snapshot download is complete, we need to map each snapshot to the metadata of the current local table. These snapshots are then reloaded to take effect, completing the final recovery job. - -## Start Restore - -1. Restore the table backup_tbl in backup snapshot_1 from example_repo to database example_db1, the time version is "2018-05-04-16-45-08". Revert to 1 copy: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. Restore partitions p1 and p2 of table backup_tbl in backup snapshot_2 from example_repo, and table backup_tbl2 to database example_db1, and rename it to new_tbl with time version "2018-05-04-17-11-01". The default reverts to 3 replicas: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. View the execution of the restore job: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -For detailed usage of RESTORE, please refer to [here](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE.md). - -## Related Commands - -The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - -1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. This command needs to use the Broker process to access the remote storage. Different brokers need to provide different parameters. For details, please refer to [Broker documentation](../../data-operate/import/broker-load-manual), or you can directly back up to support through the S3 protocol For the remote storage of AWS S3 protocol, directly back up to HDFS, please refer to [Create Remote Warehouse Documentation](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY) - -2. RESTORE - - Perform a restore operation. - -3. SHOW RESTORE - - View the execution of the most recent restore job, including: - - - JobId: The id of the current recovery job. - - Label: The name (Label) of the backup in the warehouse specified by the user. - - Timestamp: The timestamp of the backup in the user-specified repository. - - DbName: Database corresponding to the restore job. - - State: The current stage of the recovery job: - - PENDING: The initial status of the job. - - SNAPSHOTING: The snapshot operation of the newly created table is in progress. - - DOWNLOAD: Sending download snapshot task. - - DOWNLOADING: Snapshot is downloading. - - COMMIT: Prepare the downloaded snapshot to take effect. - - COMMITTING: Validating downloaded snapshots. - - FINISHED: Recovery is complete. - - CANCELLED: Recovery failed or was canceled. - - AllowLoad: Whether to allow import during restore. - - ReplicationNum: Restores the specified number of replicas. - - RestoreObjs: List of tables and partitions involved in this restore. - - CreateTime: Job creation time. - - MetaPreparedTime: Local metadata generation completion time. - - SnapshotFinishedTime: The local snapshot completion time. - - DownloadFinishedTime: The time when the remote snapshot download is completed. - - FinishedTime: The completion time of this job. - - UnfinishedTasks: During `SNAPSHOTTING`, `DOWNLOADING`, `COMMITTING` and other stages, there will be multiple subtasks going on at the same time. The current stage shown here is the task id of the unfinished subtasks. - - TaskErrMsg: If there is an error in the execution of a subtask, the error message of the corresponding subtask will be displayed here. - - Status: Used to record some status information that may appear during the entire job process. - - Timeout: The timeout period of the job, in seconds. - -4. CANCEL RESTORE - - Cancel the currently executing restore job. - -5. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. - -## Common mistakes - -1. Restore Report An Error:[20181: invalid md5 of downloaded file: /data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - If the number of copies of the table backed up and restored is inconsistent, you need to specify the number of copies when executing the restore command. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual - -2. Restore Report An Error:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - Backup and restore are not caused by the same version, use the specified meta_version to read the metadata of the previous backup. Note that this parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris. The latest version of the backup data already contains the meta version, so there is no need to specify it. For the specific solution to the above error, specify meta_version = 100. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual - -## More Help - -For more detailed syntax and best practices used by RESTORE, please refer to the [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual, You can also type `HELP RESTORE` on the MySql client command line for more help. diff --git a/versioned_docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md b/versioned_docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md index 86bfe6abc5db8..86f544b2423fc 100644 --- a/versioned_docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md +++ b/versioned_docs/version-2.1/admin-manual/maint-monitor/disk-capacity.md @@ -162,6 +162,6 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o ```rm -rf data/0/12345/``` - * Delete tablet metadata refer to [Tablet metadata management tool](./tablet-meta-tool.md) + * Delete tablet metadata refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md) ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/versioned_docs/version-2.1/admin-manual/open-api/be-http/compaction-run.md b/versioned_docs/version-2.1/admin-manual/open-api/be-http/compaction-run.md index f2b3cb45f56b7..9331506d29c53 100644 --- a/versioned_docs/version-2.1/admin-manual/open-api/be-http/compaction-run.md +++ b/versioned_docs/version-2.1/admin-manual/open-api/be-http/compaction-run.md @@ -46,7 +46,7 @@ Used to manually trigger the comparison and show status. - ID of table. Note that table_id=xxx will take effect only when compact_type=full is specified, and only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. * `compact_type` - - The value is `base` or `cumulative` or `full`. For usage scenarios of full_compaction, please refer to [Data Recovery](../../data-admin/repairing-data.md). + - The value is `base` or `cumulative` or `full`. For usage scenarios of full_compaction, please refer to [Data Recovery](../../trouble-shooting/repairing-data). ## Request body diff --git a/versioned_docs/version-2.1/faq/install-faq.md b/versioned_docs/version-2.1/faq/install-faq.md index a9e0b69c6e5e9..456135b811fb7 100644 --- a/versioned_docs/version-2.1/faq/install-faq.md +++ b/versioned_docs/version-2.1/faq/install-faq.md @@ -253,7 +253,7 @@ There are usually two reasons for this problem: 1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because `priority_network` is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying `priority_network`. 2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services. -If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document. +If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/trouble-shooting/metadata-operation.md) in the Doris official website document. ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -263,7 +263,7 @@ If the following problems occur when using MySQL client to connect to Doris, thi Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two values in the error differ by 2. Causes FE to fail to start. -This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/maint-monitor/metadata-operation.md). +This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/trouble-shooting/metadata-operation.md). ### Q12. Doris compile and install JDK version incompatibility problem diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/backup.md b/versioned_docs/version-3.0/admin-manual/data-admin/backup.md deleted file mode 100644 index 192591536de70..0000000000000 --- a/versioned_docs/version-3.0/admin-manual/data-admin/backup.md +++ /dev/null @@ -1,250 +0,0 @@ ---- -{ - "title": "Data Backup", - "language": "en" -} ---- - - - -# Data Backup - -Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can view the currently deployed broker through `SHOW BROKER;`. - -## A brief explanation of the principle - -The backup operation is to upload the data of the specified table or partition directly to the remote warehouse for storage in the form of files stored by Doris. When a user submits a Backup request, the system will perform the following operations: - -1. Snapshot and snapshot upload - - The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are all operations on snapshots. After the snapshot, changes, imports, etc. to the table no longer affect the results of the backup. Snapshots only generate a hard link to the current data file, which takes very little time. After the snapshot is completed, the snapshot files will be uploaded one by one. Snapshot uploads are done concurrently by each Backend. - -2. Metadata preparation and upload - - After the data file snapshot upload is complete, Frontend will first write the corresponding metadata to a local file, and then upload the local metadata file to the remote warehouse through the broker. Completing the final backup job - -3. Dynamic Partition Table Description - - If the table is a dynamic partition table, the dynamic partition attribute will be automatically disabled after backup. When restoring, you need to manually enable the dynamic partition attribute of the table. The command is as follows: - -```sql -ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true") -``` - -4. Backup and Restore operation will NOT keep the `colocate_with` property of a table. - -## Start Backup - -1. Create a hdfs remote warehouse example_repo: - - **WITH HDFS (Recommended)** - - ```sql - CREATE REPOSITORY `example_repo` - WITH HDFS - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "fs.defaultFS"="hdfs://hdfs_host:port", - "hadoop.username" = "hadoop" - ); - ``` - - **WITH BROKER** - - This requires starting a broker process first. - - ```sql - CREATE REPOSITORY `example_repo` - WITH BROKER `broker_name` - ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/" - PROPERTIES - ( - "username" = "user", - "password" = "password" - ); - ``` - -2. Create a remote repository for s3 : s3_repo - - ``` - CREATE REPOSITORY `s3_repo` - WITH S3 - ON LOCATION "s3://bucket_name/test" - PROPERTIES - ( - "AWS_ENDPOINT" = "http://xxxx.xxxx.com", - "AWS_ACCESS_KEY" = "xxxx", - "AWS_SECRET_KEY" = "xxx", - "AWS_REGION" = "xxx" - ); - ``` - - >Note that. - > - >ON LOCATION is followed by Bucket Name here - -1. Full backup of table example_tbl under example_db to warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label1 - TO example_repo - ON (example_tbl) - PROPERTIES ("type" = "full"); - ``` - -2. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the warehouse example_repo: - - ```sql - BACKUP SNAPSHOT example_db.snapshot_label2 - TO example_repo - ON - ( - example_tbl PARTITION (p1,p2), - example_tbl2 - ); - ``` - -4. View the execution of the most recent backup job: - - ```sql - mysql> show BACKUP\G; - *************************** 1. row *************************** - JobId: 17891847 - SnapshotName: snapshot_label1 - DbName: example_db - State: FINISHED - BackupObjs: [default_cluster:example_db.example_tbl] - CreateTime: 2022-04-08 15:52:29 - SnapshotFinishedTime: 2022-04-08 15:52:32 - UploadFinishedTime: 2022-04-08 15:52:38 - FinishedTime: 2022-04-08 15:52:44 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -5. View existing backups in remote repositories: - - ```sql - mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; - +-----------------+---------------------+--------+ - | Snapshot | Timestamp | Status | - +-----------------+---------------------+--------+ - | snapshot_label1 | 2022-04-08-15-52-29 | OK | - +-----------------+---------------------+--------+ - 1 row in set (0.15 sec) - ``` - -For the detailed usage of BACKUP, please refer to [here](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md). - -## Best Practices - -### Backup - -Currently, we support full backup with the smallest partition (Partition) granularity (incremental backup may be supported in future versions). If you need to back up data regularly, you first need to plan the partitioning and bucketing of the table reasonably when building the table, such as partitioning by time. Then, in the subsequent running process, regular data backups are performed according to the partition granularity. - -### Data Migration - -Users can back up the data to the remote warehouse first, and then restore the data to another cluster through the remote warehouse to complete the data migration. Because data backup is done in the form of snapshots, new imported data after the snapshot phase of the backup job will not be backed up. Therefore, after the snapshot is completed and until the recovery job is completed, the data imported on the original cluster needs to be imported again on the new cluster. - -It is recommended to import the new and old clusters in parallel for a period of time after the migration is complete. After verifying the correctness of data and services, migrate services to a new cluster. - -## Highlights - -1. Operations related to backup and recovery are currently only allowed to be performed by users with ADMIN privileges. -2. Within a database, only one backup or restore job is allowed to be executed. -3. Both backup and recovery support operations at the minimum partition (Partition) level. When the amount of data in the table is large, it is recommended to perform operations by partition to reduce the cost of failed retry. -4. Because of the backup and restore operations, the operations are the actual data files. Therefore, when a table has too many shards, or a shard has too many small versions, it may take a long time to backup or restore even if the total amount of data is small. Users can use `SHOW PARTITIONS FROM table_name;` and `SHOW TABLETS FROM table_name;` to view the number of shards in each partition and the number of file versions in each shard to estimate job execution time. The number of files has a great impact on the execution time of the job. Therefore, it is recommended to plan partitions and buckets reasonably when creating tables to avoid excessive sharding. -5. When checking job status via `SHOW BACKUP` or `SHOW RESTORE` command. It is possible to see error messages in the `TaskErrMsg` column. But as long as the `State` column is not `CANCELLED`, the job is still continuing. These tasks may retry successfully. Of course, some Task errors will also directly cause the job to fail. - Common `TaskErrMsg` errors are as follows: - Q1: Backup to HDFS, the status shows UPLOADING, TaskErrMsg error message: [13333: Close broker writer failed, broker:TNetworkAddress(hostname=10.10.0.0, port=8000) msg:errors while close file output stream, cause by: DataStreamer Exception : ] - This is generally a network communication problem. Check the broker log to see if a certain ip or port is blocked. If it is a cloud service, you need to check whether is accessed the intranet. If so, you can add hdfs-site.xml in the broker/conf folder, you need to add dfs.client.use.datanode.hostname=true under the hdfs-site.xml configuration file, and configure the hostname mapping of the HADOOP cluster on the broker node. -7. If the recovery job is an overwrite operation (specifying the recovery data to an existing table or partition), then from the `COMMIT` phase of the recovery job, the overwritten data on the current cluster may no longer be restored. If the restore job fails or is canceled at this time, the previous data may be damaged and inaccessible. In this case, the only way to do it is to perform the recovery operation again and wait for the job to complete. Therefore, we recommend that if unnecessary, try not to restore data by overwriting unless it is confirmed that the current data is no longer used. - -## Related Commands - -1. The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - - 1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. This command needs to use the Broker process to access the remote storage. Different brokers need to provide different parameters. For details, please refer to [Broker documentation](../../data-operate/import/broker-load-manual), or you can directly back up to support through the S3 protocol For the remote storage of AWS S3 protocol, or directly back up to HDFS, please refer to [Create Remote Warehouse Documentation](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md ) - - 2. BACKUP - - Perform a backup operation. - - 3. SHOW BACKUP - - View the execution of the most recent backup job, including: - - - JobId: The id of this backup job. - - SnapshotName: The name (Label) of this backup job specified by the user. - - DbName: Database corresponding to the backup job. - - State: The current stage of the backup job: - - PENDING: The initial status of the job. - - SNAPSHOTING: A snapshot operation is in progress. - - UPLOAD_SNAPSHOT: The snapshot is over, ready to upload. - - UPLOADING: Uploading snapshot. - - SAVE_META: The metadata file is being generated locally. - - UPLOAD_INFO: Upload metadata files and information about this backup job. - - FINISHED: The backup is complete. - - CANCELLED: Backup failed or was canceled. - - BackupObjs: List of tables and partitions involved in this backup. - - CreateTime: Job creation time. - - SnapshotFinishedTime: Snapshot completion time. - - UploadFinishedTime: Snapshot upload completion time. - - FinishedTime: The completion time of this job. - - UnfinishedTasks: During `SNAPSHOTTING`, `UPLOADING` and other stages, there will be multiple subtasks going on at the same time. The current stage shown here is the task id of the unfinished subtasks. - - TaskErrMsg: If there is an error in the execution of a subtask, the error message of the corresponding subtask will be displayed here. - - Status: Used to record some status information that may appear during the entire job process. - - Timeout: The timeout period of the job, in seconds. - - 4. SHOW SNAPSHOT - - View existing backups in the remote repository. - - - Snapshot: The name (Label) of the backup specified during backup. - - Timestamp: Timestamp of the backup. - - Status: Whether the backup is normal. - - More detailed backup information can be displayed if a where clause is specified after `SHOW SNAPSHOT`. - - - Database: The database corresponding to the backup. - - Details: Shows the complete data directory structure of the backup. - - 5. CANCEL BACKUP - - Cancel the currently executing backup job. - - 6. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. - -## More Help - - For more detailed syntax and best practices used by BACKUP, please refer to the [BACKUP](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md) command manual, You can also type `HELP BACKUP` on the MySql client command line for more help. diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/ccr.md b/versioned_docs/version-3.0/admin-manual/data-admin/ccr.md deleted file mode 100644 index 7fc88db506550..0000000000000 --- a/versioned_docs/version-3.0/admin-manual/data-admin/ccr.md +++ /dev/null @@ -1,608 +0,0 @@ ---- -{ - "title": "CCR (Cross Cluster Replication)", - "language": "en" -} ---- - - - -# Cross Cluster Replication (CCR) -## Overview - -Cross Cluster Replication (CCR) enables the synchronization of data changes from a source cluster to a target cluster at the database/table level. This feature can be used to ensure data availability for online services, isolate offline and online workloads, and build multiple data centers across various sites. - -CCR is applicable to the following scenarios: - -- Disaster recovery: This involves backing up enterprise data to another cluster and data center. In the event of a sudden incident causing business interruption or data loss, companies can recover data from the backup or quickly switch to the backup cluster. Disaster recovery is typically a must-have feature in use cases with high SLA requirements, such as those in finance, healthcare, and e-commerce. -- Read/write separation: This is to isolate querying and writing operations to reduce their mutual impact and improve resource utilization. For example, in cases of high writing pressure or high concurrency, read/write separation can distribute read and write operations to read-only and write-only database instances in various regions. This helps ensure high database performance and stability. -- Data transfer between headquarters and branch offices: In order to have unified data control and analysis within a corporation, the headquarters usually requires timely data synchronization from branch offices located in different regions. This avoids management confusion and wrong decision-making based on inconsistent data. -- Isolated upgrades: During system cluster upgrades, there might be a need to roll back to a previous version. Many traditional upgrade methods do not allow rolling back due to incompatible metadata. CCR in Doris can address this issue by building a standby cluster for upgrade and conducting dual-running verification. Users can ungrade the clusters one by one. CCR is not dependent on specific versions, making version rollback feasible. - -## Design - -### Concepts - -- Source cluster: the cluster where business data is written and originates from, requiring Doris version 2.0 - -- Target cluster: the destination cluster for cross cluster replication, requiring version 2.0 - -- Binlog: the change log of the source cluster, including schema and data changes - -- Syncer: a lightweight process - -### Architecture description - -![ccr-architecture-description](/images/ccr-architecture-description.png) - -CCR relies on a lightweight process called syncer. Syncers retrieve binlogs from the source cluster, directly apply the metadata to the target cluster, and notify the target cluster to pull data from the source cluster. CCR allows both full and incremental data migration. - -### Usage - -The usage of CCR is straightforward. Simply start the syncer service and send a command, and the syncers will take care of the rest. - -1. Deploy the source Doris cluster. -2. Deploy the target Doris cluster. -3. Both the source and target clusters need to enable binlog. Configure the following information in the fe.conf and be.conf files of the source and target clusters: - -```SQL -enable_feature_binlog=true -``` - -4. Deploy syncers - -​Build CCR syncer - -```shell -git clone https://github.com/selectdb/ccr-syncer -cd ccr-syncer -bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR> -cd SYNCER_OUTPUT_DIR# Contact the Doris community for a free CCR binary package -``` - - -Start and stop syncer - - -```shell -# Start -cd bin && sh start_syncer.sh --daemon - -# Stop -sh stop_syncer.sh -``` - -5. Enable binlog in the source cluster. - -```shell --- If you want to synchronize the entire database, you can execute the following script: -vim shell/enable_db_binlog.sh -Modify host, port, user, password, and db in the source cluster -Or ./enable_db_binlog.sh --host $host --port $port --user $user --password $password --db $db - --- If you want to synchronize a single table, you can execute the following script and enable binlog for the target table: -ALTER TABLE enable_binlog SET ("binlog.enable" = "true"); -``` - -6. Launch a synchronization task to the syncer - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "your_db_name", - "table": "your_table_name" - } -}' http://127.0.0.1:9190/create_ccr -``` - -Parameter description: - -```shell -name: name of the CCR synchronization task, should be unique -host, port: host and mysql(jdbc) port for the master FE for the corresponding cluster -user, password: the credentials used by the syncer to initiate transactions, fetch data, etc. -If it is synchronization at the database level, specify your_db_name and leave your_table_name empty -If it is synchronization at the table level, specify both your_db_name and your_table_name -The synchronization task name can only be used once. -``` - -## Operation manual for syncer - -### Start syncer - -Start syncer according to the configurations and save a pid file in the default or specified path. The name of the pid file should follow `host_port.pid`. - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```SQL -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Start options** - -**--daemon** - -Run syncer in the background, set to false by default. - -```SQL -bash bin/start_syncer.sh --daemon -``` - -**--db_type** - -Syncer can currently use two databases to store its metadata, `sqlite3 `(for local storage) and `mysql `(for local or remote storage). - -```SQL -bash bin/start_syncer.sh --db_type mysql -``` - -The default value is sqlite3. - -When using MySQL to store metadata, syncer will use `CREATE IF NOT EXISTS `to create a database called `ccr`, where the metadata table related to CCR will be saved. - -**--db_dir** - -**This option only works when db uses** **`sqlite3`****.** - -It allows you to specify the name and path of the db file generated by sqlite3. - -```SQL -bash bin/start_syncer.sh --db_dir /path/to/ccr.db -``` - -The default path is `SYNCER_OUTPUT_DIR/db` and the default file name is `ccr.db`. - -**--db_host & db_port & db_user & db_password** - -**This option only works when db uses** **`mysql`****.** - -```SQL -bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456" -``` - -The default values of db_host and db_port are shown in the example. The default values of db_user and db_password are empty. - -**--log_dir** - -Output path of the logs: - -```SQL -bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log -``` - -The default path is`SYNCER_OUTPUT_DIR/log` and the default file name is `ccr_syncer.log`. - -**--log_level** - -Used to specify the output level of syncer logs. - -```SQL -bash bin/start_syncer.sh --log_level info -``` - -The format of the log is as follows, where the hook will only be printed when `log_level > info `: - -```SQL -# time level msg hooks -[2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] INFO This is info type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] WARN This is warn type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx -[2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx -``` - -Under --daemon, the default value of log_level is `info`. - -When running in the foreground, log_level defaults to `trace`, and logs are saved to log_dir using the tee command. - -**--host && --port** - -Used to specify the host and port of syncer, where host only plays the role of distinguishing itself in the cluster, which can be understood as the name of syncer, and the name of syncer in the cluster is `host: port`. - -```SQL -bash bin/start_syncer.sh --host 127.0.0.1 --port 9190 -``` - -The default value of host is 127.0.0.1, and the default value of port is 9190. - -**--pid_dir** - -Used to specify the storage path of the pid file - -The pid file is the credentials for closing the syncer. It is used in the stop_syncer.sh script. It saves the corresponding syncer process number. In order to facilitate management of syncer, you can specify the storage path of the pid file. - -```SQL -bash bin/start_syncer.sh --pid_dir /path/to/pids -``` - -The default value is `SYNCER_OUTPUT_DIR/bin`. - -### Stop syncer - -Stop the syncer according to the process number in the pid file under the default or specified path. The name of the pid file should follow `host_port.pid`. - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Stop options** - -Syncers can be stopped in three ways: - -1. Stop a single syncer in the directory - -Specify the host and port of the syncer to be stopped. Be sure to keep it consistent with the host specified when start_syncer - -2. Batch stop the specified syncers in the directory - -Specify the names of the pid files to be stopped, wrap the names in `""` and separate them with spaces. - -3. Stop all syncers in the directory - -Follow the default configurations. - -**--pid_dir** - -Specify the directory where the pid file is located. The above three stopping methods all depend on the directory where the pid file is located for execution. - -```shell -bash bin/stop_syncer.sh --pid_dir /path/to/pids -``` - -The effect of the above example is to close the syncers corresponding to all pid files under `/path/to/pids `( **method 3** ). `-- pid_dir `can be used in combination with the above three syncer stopping methods. - -The default value is `SYNCER_OUTPUT_DIR/bin`. - -**--host && --port** - -Stop the syncer corresponding to host: port in the pid_dir path. - -```shell -bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190 -``` - -The default value of host is 127.0.0.1, and the default value of port is empty. That is, specifying the host alone will degrade **method 1** to **method 3**. **Method 1** will only take effect when neither the host nor the port is empty. - -**--files** - -Stop the syncer corresponding to the specified pid file name in the pid_dir path. - -```shell -bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid" -``` - -The file names should be wrapped in `" "` and separated with spaces. - -### Syncer operations - -**Template for requests** - -```shell -curl -X POST -H "Content-Type: application/json" -d {json_body} http://ccr_syncer_host:ccr_syncer_port/operator -``` - -json_body: send operation information in JSON format - -operator: different operations for syncer - -The interface returns JSON. If successful, the "success" field will be true. Conversely, if there is an error, it will be false, and then there will be an `ErrMsgs` field. - -```JSON -{"success":true} - -or - -{"success":false,"error_msg":"job ccr_test not exist"} -``` - -**Operators** - -- create_ccr - -Create CCR tasks - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "ccr_test", - "src": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "demo", - "table": "example_tbl" - }, - "dest": { - "host": "localhost", - "port": "9030", - "thrift_port": "9020", - "user": "root", - "password": "", - "database": "ccrt", - "table": "copy" - } -}' http://127.0.0.1:9190/create_ccr -``` - -- name: the name of the CCR synchronization task, should be unique -- host, port: correspond to the host and mysql (jdbc) port of the cluster's master -- thrift_port: corresponds to the rpc_port of the FE -- user, password: the credentials used by the syncer to initiate transactions, fetch data, etc. -- database, table: - - If it is a database-level synchronization, fill in the database name and leave the table name empty. - - If it is a table-level synchronization, specify both the database name and the table name. - -- get_lag - -View the synchronization progress. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/get_lag -``` - -The job_name is the name specified when create_ccr. - -- pause - -Pause synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/pause -``` - -- resume - -Resume synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/resume -``` - -- delete - -Delete synchronization task. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/delete -``` - -- version - -View version information. - -```shell -curl http://ccr_syncer_host:ccr_syncer_port/version - -# > return -{"version": "2.0.1"} -``` - -- job status - -View job status. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/job_status - -{ - "success": true, - "status": { - "name": "ccr_db_table_alias", - "state": "running", - "progress_state": "TableIncrementalSync" - } -} -``` - -- desync job - -No sync. Users can swap the source and target clusters. - -```shell -curl -X POST -H "Content-Type: application/json" -d '{ - "name": "job_name" -}' http://ccr_syncer_host:ccr_syncer_port/desync -``` - -- list_jobs - -List all created tasks. - -```shell -curl http://ccr_syncer_host:ccr_syncer_port/list_jobs - -{"success":true,"jobs":["ccr_db_table_alias"]} -``` - -### Open binlog for all tables in the database - -**Output file structure** - -The file structure can be seen under the output path after compilation: - -```shell -output_dir - bin - ccr_syncer - enable_db_binlog.sh - start_syncer.sh - stop_syncer.sh - db - [ccr.db] # Generated after running with the default configurations. - log - [ccr_syncer.log] # Generated after running with the default configurations. -``` - -**The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.** - -**Usage** - -```shell -bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db -``` - -## High availability of syncer - -The high availability of syncers relies on MySQL. If MySQL is used as the backend storage, the syncer can discover other syncers. If one syncer crashes, the others will take over its tasks. - -### Privilege requirements - -1. `select_priv`: read-only privileges for databases and tables -2. `load_priv`: write privileges for databases and tables, including load, insert, delete, etc. -3. `alter_priv`: privilege to modify databases and tables, including renaming databases/tables, adding/deleting/changing columns, adding/deleting partitions, etc. -4. `create_priv`: privilege to create databases, tables, and views -5. `drop_priv`: privilege to drop databases, tables, and views - -Admin privileges are required (We are planning on removing this in future versions). This is used to check the `enable binlog config`. - -## Usage restrictions - -### Network constraints - -- Syncer needs to have connectivity to both the upstream and downstream FEs and BEs. -- The downstream BE should have connectivity to the upstream BE. -- The external IP and Doris internal IP should be the same. In other words, the IP address visible in the output of `show frontends/backends` should be the same IP that can be directly connected to. It should not involve IP forwarding or NAT for direct connections. - -### ThriftPool constraints - -It is recommended to increase the size of the Thrift thread pool to a number greater than the number of buckets involved in a single commit operation. - -### Version requirements - -Minimum required version: V2.0.3 - -### Unsupported operations - -- Rename table -- Operations such as table drop-recovery -- Operations related to rename table, replace partition -- Concurrent backup/restore within the same database - -## Feature - -### Rate limit - -BE-side configuration parameter - -```shell -download_binlog_rate_limit_kbs=1024 # Limits the download speed of Binlog (including Local Snapshot) from the source cluster to 1 MB/s in a single BE node -``` - -1. The `download_binlog_rate_limit_kbs` parameter is configured on the BE nodes of the source cluster. By setting this parameter, the data pull rate can be effectively limited. - -2. The `download_binlog_rate_limit_kbs` parameter primarily controls the speed of data transfer for each single BE node. To calculate the overall cluster rate, one would multiply the parameter value by the number of nodes in the cluster. - -## IS_BEING_SYNCED - -:::tip -Doris v2.0 "is_being_synced" = "true" -::: - -During data synchronization using CCR, replica tables (referred to as target tables) are created in the target cluster for the tables within the synchronization scope of the source cluster (referred to as source tables). However, certain functionalities and attributes need to be disabled or cleared when creating replica tables to ensure the correctness of the synchronization process. For example: - -- The source tables may contain information that is not synchronized to the target cluster, such as `storage_policy`, which may cause the creation of the target table to fail or result in abnormal behavior. -- The source tables may have dynamic functionalities, such as dynamic partitioning, which can lead to uncontrolled behavior in the target table and result in inconsistent partitions. - -The attributes that need to be cleared during replication are: - -- `storage_policy` -- `colocate_with` - -The functionalities that need to be disabled during synchronization are: - -- Automatic bucketing -- Dynamic partitioning - -### Implementation - -When creating the target table, the syncer controls the addition or deletion of the `is_being_synced` property. In CCR, there are two approaches to creating a target table: - -1. During table synchronization, the syncer performs a full copy of the source table using backup/restore to obtain the target table. -2. During database synchronization, for existing tables, the syncer also uses backup/restore to obtain the target table. For incremental tables, the syncer creates the target table using the CreateTableRecord binlog. - -Therefore, there are two entry points for inserting the `is_being_synced` property: the restore process during full synchronization and the getDdlStmt during incremental synchronization. - -During the restoration process of full synchronization, the syncer initiates a restore operation of the snapshot from the source cluster via RPC. During this process, the `is_being_synced` property is added to the RestoreStmt and takes effect in the final restoreJob, executing the relevant logic for `is_being_synced`. - -During incremental synchronization, add the `boolean getDdlForSync` parameter to the getDdlStmt method to differentiate whether it is a controlled transformation to the target table DDL, and execute the relevant logic for isBeingSynced during the creation of the target table. - -Regarding the disabling of the functionalities mentioned above: - -- Automatic bucketing: Automatic bucketing is enabled when creating a table. It calculates the appropriate number of buckets. This may result in a mismatch in the number of buckets between the source and target tables. Therefore, during synchronization, obtain the number of buckets from the source table, as well as the information about whether the source table is an automatic bucketing table in order to restore the functionality after synchronization. The current recommended approach is to default the autobucket attribute to false when retrieving distribution information. During table restoration, check the `_auto_bucket` attribute to find out if the source table is an automatic bucketing table. If it is, set the target table's autobucket field to true to bypass the calculation of bucket numbers and directly apply the number of buckets from the source table to the target table. -- Dynamic partitioning: This is implemented by adding `olapTable.isBeingSynced()` to the condition for executing add/drop partition operations. This ensures that the target table does not perform periodic add/drop partition operations during synchronization. - -### Note - -The `is_being_synced` property should be fully controlled by the syncer, and users should not modify this property manually unless there are exceptional circumstances. diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/repairing-data.md b/versioned_docs/version-3.0/admin-manual/data-admin/repairing-data.md deleted file mode 100644 index b1f98cf9b6288..0000000000000 --- a/versioned_docs/version-3.0/admin-manual/data-admin/repairing-data.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -{ - "title": "Repairing Data", - "language": "en" -} ---- - - - -# Repairing Data - -For the Unique Key Merge on Write table, there are bugs in some Doris versions, which may cause errors when the system calculates the delete bitmap, resulting in duplicate primary keys. At this time, the full compaction function can be used to repair the data. This function is invalid for non-Unique Key Merge on Write tables. - -This feature requires Doris version 2.0+. - -To use this function, it is necessary to stop the import as much as possible, otherwise problems such as import timeout may occur. - -## Brief principle explanation - -After the full compaction is executed, the delete bitmap will be recalculated, and the wrong delete bitmap data will be deleted to complete the data restoration. - -## Instructions for use - -`POST /api/compaction/run?tablet_id={int}&compact_type=full` - -or - -`POST /api/compaction/run?table_id={int}&compact_type=full` - -Note that only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. - -## Example of use - -``` -curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" -curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" -``` \ No newline at end of file diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/restore.md b/versioned_docs/version-3.0/admin-manual/data-admin/restore.md deleted file mode 100644 index 3a5ff88f1fdd5..0000000000000 --- a/versioned_docs/version-3.0/admin-manual/data-admin/restore.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -{ - "title": "Data Restore", - "language": "en" -} ---- - - - -# Data Recovery - -Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters. - -This feature requires Doris version 0.8.2+ - -To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can view the currently deployed broker through `SHOW BROKER;`. - -## Brief principle description - -The restore operation needs to specify an existing backup in the remote warehouse, and then restore the content of the backup to the local cluster. When the user submits the Restore request, the system will perform the following operations: - -1. Create the corresponding metadata locally - - This step will first create and restore the corresponding table partition and other structures in the local cluster. After creation, the table is visible, but not accessible. - -2. Local snapshot - - This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the table just created has no data), and its purpose is to generate the corresponding snapshot directory on the Backend for later receiving the snapshot file downloaded from the remote warehouse. - -3. Download snapshot - - The snapshot files in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is done concurrently by each Backend. - -4. Effective snapshot - - After the snapshot download is complete, we need to map each snapshot to the metadata of the current local table. These snapshots are then reloaded to take effect, completing the final recovery job. - -## Start Restore - -1. Restore the table backup_tbl in backup snapshot_1 from example_repo to database example_db1, the time version is "2018-05-04-16-45-08". Revert to 1 copy: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_1` - FROM `example_repo` - ON ( `backup_tbl` ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-52-29", - "replication_num" = "1" - ); - ``` - -2. Restore partitions p1 and p2 of table backup_tbl in backup snapshot_2 from example_repo, and table backup_tbl2 to database example_db1, and rename it to new_tbl with time version "2018-05-04-17-11-01". The default reverts to 3 replicas: - - ```sql - RESTORE SNAPSHOT example_db1.`snapshot_2` - FROM `example_repo` - ON - ( - `backup_tbl` PARTITION (`p1`, `p2`), - `backup_tbl2` AS `new_tbl` - ) - PROPERTIES - ( - "backup_timestamp"="2022-04-08-15-55-43" - ); - ``` - -3. View the execution of the restore job: - - ```sql - mysql> SHOW RESTORE\G; - *************************** 1. row *************************** - JobId: 17891851 - Label: snapshot_label1 - Timestamp: 2022-04-08-15-52-29 - DbName: default_cluster:example_db1 - State: FINISHED - AllowLoad: false - ReplicationNum: 3 - RestoreObjs: { - "name": "snapshot_label1", - "database": "example_db", - "backup_time": 1649404349050, - "content": "ALL", - "olap_table_list": [ - { - "name": "backup_tbl", - "partition_names": [ - "p1", - "p2" - ] - } - ], - "view_list": [], - "odbc_table_list": [], - "odbc_resource_list": [] - } - CreateTime: 2022-04-08 15:59:01 - MetaPreparedTime: 2022-04-08 15:59:02 - SnapshotFinishedTime: 2022-04-08 15:59:05 - DownloadFinishedTime: 2022-04-08 15:59:12 - FinishedTime: 2022-04-08 15:59:18 - UnfinishedTasks: - Progress: - TaskErrMsg: - Status: [OK] - Timeout: 86400 - 1 row in set (0.01 sec) - ``` - -For detailed usage of RESTORE, please refer to [here](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE.md). - -## Related Commands - -The commands related to the backup and restore function are as follows. For the following commands, you can use `help cmd;` to view detailed help after connecting to Doris through mysql-client. - -1. CREATE REPOSITORY - - Create a remote repository path for backup or restore. This command needs to use the Broker process to access the remote storage. Different brokers need to provide different parameters. For details, please refer to [Broker documentation](../../data-operate/import/broker-load-manual), or you can directly back up to support through the S3 protocol For the remote storage of AWS S3 protocol, directly back up to HDFS, please refer to [Create Remote Warehouse Documentation](./../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY) - -2. RESTORE - - Perform a restore operation. - -3. SHOW RESTORE - - View the execution of the most recent restore job, including: - - - JobId: The id of the current recovery job. - - Label: The name (Label) of the backup in the warehouse specified by the user. - - Timestamp: The timestamp of the backup in the user-specified repository. - - DbName: Database corresponding to the restore job. - - State: The current stage of the recovery job: - - PENDING: The initial status of the job. - - SNAPSHOTING: The snapshot operation of the newly created table is in progress. - - DOWNLOAD: Sending download snapshot task. - - DOWNLOADING: Snapshot is downloading. - - COMMIT: Prepare the downloaded snapshot to take effect. - - COMMITTING: Validating downloaded snapshots. - - FINISHED: Recovery is complete. - - CANCELLED: Recovery failed or was canceled. - - AllowLoad: Whether to allow import during restore. - - ReplicationNum: Restores the specified number of replicas. - - RestoreObjs: List of tables and partitions involved in this restore. - - CreateTime: Job creation time. - - MetaPreparedTime: Local metadata generation completion time. - - SnapshotFinishedTime: The local snapshot completion time. - - DownloadFinishedTime: The time when the remote snapshot download is completed. - - FinishedTime: The completion time of this job. - - UnfinishedTasks: During `SNAPSHOTTING`, `DOWNLOADING`, `COMMITTING` and other stages, there will be multiple subtasks going on at the same time. The current stage shown here is the task id of the unfinished subtasks. - - TaskErrMsg: If there is an error in the execution of a subtask, the error message of the corresponding subtask will be displayed here. - - Status: Used to record some status information that may appear during the entire job process. - - Timeout: The timeout period of the job, in seconds. - -4. CANCEL RESTORE - - Cancel the currently executing restore job. - -5. DROP REPOSITORY - - Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and does not delete the actual warehouse data. - -## Common mistakes - -1. Restore Report An Error:[20181: invalid md5 of downloaded file: /data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected: f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e] - - If the number of copies of the table backed up and restored is inconsistent, you need to specify the number of copies when executing the restore command. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual - -2. Restore Report An Error:[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum required version 100] - - Backup and restore are not caused by the same version, use the specified meta_version to read the metadata of the previous backup. Note that this parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris. The latest version of the backup data already contains the meta version, so there is no need to specify it. For the specific solution to the above error, specify meta_version = 100. For specific commands, please refer to [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual - -## More Help - -For more detailed syntax and best practices used by RESTORE, please refer to the [RESTORE](../../sql-manual/sql-statements/data-modification/backup-and-restore/RESTORE) command manual, You can also type `HELP RESTORE` on the MySql client command line for more help. diff --git a/versioned_docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md b/versioned_docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md index 86bfe6abc5db8..86f544b2423fc 100644 --- a/versioned_docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md +++ b/versioned_docs/version-3.0/admin-manual/maint-monitor/disk-capacity.md @@ -162,6 +162,6 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o ```rm -rf data/0/12345/``` - * Delete tablet metadata refer to [Tablet metadata management tool](./tablet-meta-tool.md) + * Delete tablet metadata refer to [Tablet metadata management tool](../trouble-shooting/tablet-meta-tool.md) ```./lib/meta_tool --operation=delete_header --root_path=/path/to/root_path --tablet_id=12345 --schema_hash= 352781111``` diff --git a/versioned_docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md b/versioned_docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md index f2b3cb45f56b7..9331506d29c53 100644 --- a/versioned_docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md +++ b/versioned_docs/version-3.0/admin-manual/open-api/be-http/compaction-run.md @@ -46,7 +46,7 @@ Used to manually trigger the comparison and show status. - ID of table. Note that table_id=xxx will take effect only when compact_type=full is specified, and only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. * `compact_type` - - The value is `base` or `cumulative` or `full`. For usage scenarios of full_compaction, please refer to [Data Recovery](../../data-admin/repairing-data.md). + - The value is `base` or `cumulative` or `full`. For usage scenarios of full_compaction, please refer to [Data Recovery](../../trouble-shooting/repairing-data). ## Request body diff --git a/versioned_docs/version-3.0/faq/install-faq.md b/versioned_docs/version-3.0/faq/install-faq.md index a9e0b69c6e5e9..456135b811fb7 100644 --- a/versioned_docs/version-3.0/faq/install-faq.md +++ b/versioned_docs/version-3.0/faq/install-faq.md @@ -253,7 +253,7 @@ There are usually two reasons for this problem: 1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because `priority_network` is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying `priority_network`. 2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services. -If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document. +If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance document] (../admin-manual/trouble-shooting/metadata-operation.md) in the Doris official website document. ### Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0 @@ -263,7 +263,7 @@ If the following problems occur when using MySQL client to connect to Doris, thi Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two values in the error differ by 2. Causes FE to fail to start. -This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/maint-monitor/metadata-operation.md). +This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the operation of failure recovery in [Metadata Operation and Maintenance Documentation](../admin-manual/trouble-shooting/metadata-operation.md). ### Q12. Doris compile and install JDK version incompatibility problem