You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Being able to define multiple Prometheus consumers would be very useful. This would allow statistical collection on any set of filtered criteria. Right now, it is impossible to get statistics on specific criteria - all Prometheus data is lumped together. Being able to separate out different subsets of data might actually eliminate the need to centralize DNSTAP data and log it into a database for future queries entirely.
Describe the solution you'd like
It would be ideal if many prometheus logging consumers could be specified at the end of a chain of pipelines or filters.
Here's a fully-formed potential example file. It's essentially the same syntax as today, with the only addition of a new method of parsing the "prometheus-labels:" primitive to include arbitrary labels in the stored data for that particular instantiation of the prometheus consumer.
Describe alternatives you've considered
DNSTAP data is collected for insights. If it is possible to create insights via a TSDB (Prometheus) scrape of go-dnscollector with specific pre-understood filtered insights, then it may minimize the need for much of the DNSTAP data to be forwarded to a central collector, or for database queries to somehow be applied to the historical data set. This may not be the majority of cases, but there are certainly subsets of data that have high interest to DNS system administrators which could be aggregated very efficiently by go-dnscollector if they could be segmented out into Prometheus metrics at the edge of the collection pipeline instead of at the very end. The alternatives to this method are the traditional database queries that are done against the final set of DNSTAP data that resides in a data pool somewhere, which are slow and batch-based versus real-time ingested into a TSDB.
Additional context
Each Prometheus instance would need to somehow distinguish itself from the others. The very good news is that Prometheus does this already today with the concept of labels. Or an alternate port (or alternate URL endpoint) could be used, but that may have configuration complexity.
I see two possible methods to segment the multiple Prometheus data sets from each other, which do not necessarily conflict:
enforce configuration parsing rules that each Prometheus instance has its own port number, or endpoint (/metrics, /metrics/apple, /metrics-apple or similar configurable names.) Each instance would build a set of Prometheus metrics based on the data it receives, plus the "go_" and "process_" outputs from the main code routines.
enforce configuration parsing rules that demand that each Prometheus declaration stanza has at least one label that is different than any other instantiation ( "prometheus-labels: ["stream_id" "job=all-queries" ]" and "prometheus-labels: ["stream_id" "job=apple.com" ]" as shown above would be sufficient even if we omitted port numbers, as an example) but allow multiple instances to be interleaved in the same query to "/metrics" - they would be distinct due to their different labels. If this method is used, then the "go_" and "process_" results need only be shown once.
I have no particular bias towards one of these methods, and can see uses for both. Perhaps they are both used.
The text was updated successfully, but these errors were encountered:
This is even more useful with the "TopN" model, just by itself. Given the flexibility of the pipeline and filter concepts, having separate prometheus instances would solve a significant number of issues. It would be possible with the "TopN" concept to divide interesting traffic up into fairly large buckets ("Top 100 AAAA records that are SERVFAIL", "Top 3000 NOERROR responses with DNSSEC AD bit set") etc. without everything being combined into a single job or metrics stepping on each other if there were multiple prometheus pipelines possible. (sorry for enthusiasm here, but the more I think about this the more it is useful for snapshot understanding of system behaviors via Prometheus, in conjunction with the DNSTAP reporting back to a core analysis platform.)
Is your feature request related to a problem? Please describe.
Being able to define multiple Prometheus consumers would be very useful. This would allow statistical collection on any set of filtered criteria. Right now, it is impossible to get statistics on specific criteria - all Prometheus data is lumped together. Being able to separate out different subsets of data might actually eliminate the need to centralize DNSTAP data and log it into a database for future queries entirely.
Describe the solution you'd like
It would be ideal if many prometheus logging consumers could be specified at the end of a chain of pipelines or filters.
Here's a fully-formed potential example file. It's essentially the same syntax as today, with the only addition of a new method of parsing the "prometheus-labels:" primitive to include arbitrary labels in the stored data for that particular instantiation of the prometheus consumer.
Describe alternatives you've considered
DNSTAP data is collected for insights. If it is possible to create insights via a TSDB (Prometheus) scrape of go-dnscollector with specific pre-understood filtered insights, then it may minimize the need for much of the DNSTAP data to be forwarded to a central collector, or for database queries to somehow be applied to the historical data set. This may not be the majority of cases, but there are certainly subsets of data that have high interest to DNS system administrators which could be aggregated very efficiently by go-dnscollector if they could be segmented out into Prometheus metrics at the edge of the collection pipeline instead of at the very end. The alternatives to this method are the traditional database queries that are done against the final set of DNSTAP data that resides in a data pool somewhere, which are slow and batch-based versus real-time ingested into a TSDB.
Additional context
Each Prometheus instance would need to somehow distinguish itself from the others. The very good news is that Prometheus does this already today with the concept of labels. Or an alternate port (or alternate URL endpoint) could be used, but that may have configuration complexity.
I see two possible methods to segment the multiple Prometheus data sets from each other, which do not necessarily conflict:
enforce configuration parsing rules that each Prometheus instance has its own port number, or endpoint (/metrics, /metrics/apple, /metrics-apple or similar configurable names.) Each instance would build a set of Prometheus metrics based on the data it receives, plus the "go_" and "process_" outputs from the main code routines.
enforce configuration parsing rules that demand that each Prometheus declaration stanza has at least one label that is different than any other instantiation ( "prometheus-labels: ["stream_id" "job=all-queries" ]" and "prometheus-labels: ["stream_id" "job=apple.com" ]" as shown above would be sufficient even if we omitted port numbers, as an example) but allow multiple instances to be interleaved in the same query to "/metrics" - they would be distinct due to their different labels. If this method is used, then the "go_" and "process_" results need only be shown once.
I have no particular bias towards one of these methods, and can see uses for both. Perhaps they are both used.
The text was updated successfully, but these errors were encountered: