Dealing with common alert name from multiple Prometheus instances in "alerts" #7680

anuragbhatia · 2024-09-01T17:57:21Z

anuragbhatia
Sep 1, 2024

I have a distributed setup with bunch of Prometheus instances running in each failure domain (as often suggested) with Thanos sidecar + object storage offload. In this system alerts via ruler seem simple/quick but makes "ruler" the single point failure. If I put local rules in each prometheus like say "alert if storage is at 80% level" then it works for purpose of alerting with no single point failures but I see multiple alerts with same name in Thanos frontend "alerts" section.

Wonder how do you deal with local common rules duplication issue?

@anuragbhatia · 2024-09-01T17:58:47Z

dosubot[bot]
bot Sep 1, 2024

Hey @anuragbhatia! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

I found 2 similar open issues that might be relevant to your problem:

Identify duplicate Recording rules and Alerting rules in thanos-ruler - This issue is open and was last updated on April 11, 2023. It has no comments or solution.
Getting duplicate alerts if we remove the Thanos Ruler - This issue is open and was last updated on April 06, 2023. The suggested solution is to refer to PromLabs documentation on relabeling rules: PromLabs Relabeling Rules ^[1]^[2].

To deal with the issue of local common rules duplication in a distributed Prometheus setup with Thanos, where multiple alerts with the same name appear in the Thanos frontend "alerts" section, you can use the DedupRules function. This function deduplicates rules based on their type, name, labels, and other attributes.

The deduplication logic is as follows:

Rule Group Level Deduplication: Deduplication first occurs at the rule group level using the group name and group file as identifiers.
Rule Level Deduplication: Within each group, deduplication happens at the rule level based on:
- Rule type (recording rule vs. alerting rule)
- Rule name
- Rule label names
- Rule expression (expr field)
- Alerting rule for field

Here is an example of how the DedupRules function is used in test cases:

func TestDedupRules(t *testing.T) {
	for _, tc := range []struct {
		name          string
		rules, want   []*rulespb.Rule
		replicaLabels []string
	}{
		// Test cases here...
	} {
		t.Run(tc.name, func(t *testing.T) {
			got := DedupRules(tc.rules, tc.replicaLabels)
			testutil.Equals(t, tc.want, got)
		})
	}
}

Additionally, the Thanos Querier can be configured with the --rule.replica-label command line argument to handle deduplication based on replica labels, ensuring a consolidated and accurate view of alerts and recording rules across your distributed setup ^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with common alert name from multiple Prometheus instances in "alerts" #7680

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Dealing with common alert name from multiple Prometheus instances in "alerts" #7680

anuragbhatia Sep 1, 2024

Replies: 1 comment

dosubot[bot] bot Sep 1, 2024

anuragbhatia
Sep 1, 2024

dosubot[bot]
bot Sep 1, 2024