Replies: 5 comments 1 reply
-
Thank you for this. To me it's clear now. I only wonder if we have to do v2 or there is way to add required interface like |
Beta Was this translation helpful? Give feedback.
-
First Idea I am trying to do is to add:
And make |
Beta Was this translation helpful? Give feedback.
-
That ended up working, but caller implementation is kind of confusing - trying iterator now:
|
Beta Was this translation helpful? Give feedback.
-
Attempt #927 |
Beta Was this translation helpful? Give feedback.
-
Relevant discussion: #927 (comment) BTW I feel I don't use discussions as it should be XD I start new threads... |
Beta Was this translation helpful? Give feedback.
-
Mirroring existing non-Prometheus metrics into a metrics exposition in a Prometheus compatible format ought to happen with
NewConstMetric
and friends. While syntactically a but clunky (for historical reasons), it technically works just fine. The whole processing chain from creating theMetric
instance, collect it through aCollector
, gather and validate and sort it upon aGather
call, and serve it via HTTP involves quite a few bits of heavy-lifting, but that's rarely a concern because the resource usage is usually small (either in absolute terms for an exporter that just exports a small number of metrics or in relative terms for a binary that might expose many metrics but also has an actual production workload that is much more demanding).There are exceptions, though, the most (in)famous of which is KSM. It's a binary with the only task of exporting a gigantic amount of K8s metrics to Prometheus. In larger K8s clusters, KSM consumed multiple cores and a significant amount of RAM to serve those metrics in the way described above. By shedding all the layers of abstraction and creating the Prometheus text format directly, KSM was optimized to use only fractions of a core and much less RAM while serving the metrics much faster. What made this approach even more attractive is that the KSM metrics change relatively rarely compared to a typical scrape interval. With the
NewConstMetric
approach above, everything was instantiated, gathered, validated, sorted, encoded, … for each scrape from scratch. The current KSM, however, essentially holds a pre-rendered representation of all the metrics in memory and only differentially changes it when metric values change or a metric appears or disappears.The problem with the KSM approach is that it sheds almost all layers of abstraction. For example, if a scraper would prefer OpenMetrics, there is no easy way of supporting that. The Prometheus text format is essentially hardcoded in the KSM code, and the changes required to support a different format like OpenMetrics would be sprinkled over many code locations. Perhaps that's still the best trade-off for an extreme case like KSM. However, the use case of a rarely changing set of metrics that are already validated and sorted and "only" need to be exposed also occurs in less extreme cases. My idea here is to offer support for this case at a higher level of abstraction so that concerns about the metric encodings are still abstracted away.
The natural low level format for client_golang is the internal data model, which is the Go protobuf types, usually aliased as
dto
in the code (for "data transfer object"). (This data model was chosen at a time when protobuf was the exposition format of choice. That's not the case right now, so this decision is currently more of a burden than helpful, but it could be revised in v2 of client_golang.) Conveniently, theGatherer
interface has aGather
method returning a[]*dto.MetricFamily
, which looks like precisely what we want. So I can easily create a static[]*dto.MetricFamily
and serve it via a trivialGatherer
implementation:If you ignore the clunkiness of the (old-style) Go protobuf code, this is fairly simple. It circumvents all the consistency checks and validations and sorting, but it takes care of the encoding into the exposition format. The developer has to make sure that their
[]*dto.MetricFamily
slice is valid, but that's it. The program above can serve Prometheus protobuf format, Prometheus text format, and (thanks to the setting in theHandlerOpts
) even (limited) OpenMetrics. Try it out by running the program andcurl
ing it:Amazing, isn't it? :o)
(Note for OpenMetrics experts: The encoder decided to change the type from
counter
tounknown
because the metric namefoo
needed to be rendered asfoo_total
for a counter in OpenMetrics but that will then be ingested differently by Prometheus.)Now of course this is a different trade-off from KSM. The encoding has a price, and while protobuf might be cheap (because the internal data model is what the protobuf encoder expects), other formats will have a certain encoding cost. However, this approach will still cut a lot of the cost and might be the sweet spot for other, less extreme use cases of "serving more or less static metrics".
So what's the problem? Why is nobody doing it that way?
Besides clunkiness of instantiating the DTO
[]*dto.MetricFamily
explicitly (which could be mitigated by nicer generated protobuf code or by a different internal data model), there is a much more serious problem: Occasionally, we do want to change the metrics we serve. So we would add another element to the[]*dto.MetricFamily
slice, or remove one, or change the value of a metric in place. However, a scrape can hit at any time, so any of those changes would run into a data race. We cannot simply add a mutex to ourGatherer
because theGather
method returns the[]*dto.MetricFamily
slice, and after that we don't have any control what the caller is doing with it and when. We could create a deep copy of[]*dto.MetricFamily
and return it instead, but given that it is full of nested pointers, it would require a lot of allocations, defeating the purpose of the whole idea. The copying would be easier if the internal data model was "flatter", enabling (ideally) copying a continuous piece of memory, but that would again require a completely new internal data model.A less invasive approach would be to somehow control when the
[]*dto.MetricFamily
is read from to serve metrics so that we can modify it in a concurrency-safe way. The current design of theGatherer
interface doesn't lend itself easily to it. Which is the reason why I had shelved this whole idea for the time when the mythical v2 of client_golang will happen, which will most likely give us another internal data model and has the opportunity to rethink the fundamental flow with theCollector
and theGatherer
interfaces etc. (which has grown organically over time and is certainly less elegant and harder to grok than it could be). With the state as-is, we could perhaps do something hacky, like a specialpromhttp.Handler
that exposes a mutex you can lock to ensure exclusive access to the[]*dto.MetricFamily
slice…Beta Was this translation helpful? Give feedback.
All reactions