Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alert: remove keep_alive_total metric (#18227) #18234

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,21 +63,6 @@ summary: TiDB 集群中各组件的报警规则详解。

参考 [`TiDB_schema_error`](#tidb_schema_error) 的处理方法。

#### `TiDB_monitor_keep_alive`

* 报警规则:

`increase(tidb_monitor_keep_alive_total[10m]) < 100`

* 规则描述:

表示 TiDB 的进程是否仍然存在。如果在 10 分钟之内 `tidb_monitor_keep_alive_total` 增加次数少于 100,则 TiDB 的进程可能已经退出,此时会报警。

* 处理方法:

* 检查 TiDB 进程是否 OOM。
* 检查机器是否发生了重启。

### 严重级别报警项

#### `TiDB_server_panic_total`
Expand Down
Loading