-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add histogram metric type #386
feat: Add histogram metric type #386
Conversation
/// \param metric The metric object to update. | ||
/// \param value The amount to observe metric's value to. | ||
/// \return a TRITONSERVER_Error indicating success or failure. | ||
TRITONSERVER_DECLSPEC struct TRITONSERVER_Error* TRITONSERVER_MetricObserve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse TRITONSERVER_MetricSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmccorm4 Any thought? Basically we need to merge two funcions below into one Metric::Set(double value)
. It works but may add confusion.
Lines 338 to 404 in fd5c44b
TRITONSERVER_Error* | |
Metric::Set(double value) | |
{ | |
if (metric_ == nullptr) { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_INTERNAL, | |
"Could not set metric value. Metric has been invalidated."); | |
} | |
switch (kind_) { | |
case TRITONSERVER_METRIC_KIND_COUNTER: { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"TRITONSERVER_METRIC_KIND_COUNTER does not support Set"); | |
} | |
case TRITONSERVER_METRIC_KIND_GAUGE: { | |
auto gauge_ptr = reinterpret_cast<prometheus::Gauge*>(metric_); | |
gauge_ptr->Set(value); | |
break; | |
} | |
case TRITONSERVER_METRIC_KIND_HISTOGRAM: { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"TRITONSERVER_METRIC_KIND_HISTOGRAM does not support Set"); | |
} | |
default: | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"Unsupported TRITONSERVER_MetricKind"); | |
} | |
return nullptr; // Success | |
} | |
TRITONSERVER_Error* | |
Metric::Observe(double value) | |
{ | |
if (metric_ == nullptr) { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_INTERNAL, | |
"Could not set metric value. Metric has been invalidated."); | |
} | |
switch (kind_) { | |
case TRITONSERVER_METRIC_KIND_COUNTER: { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"TRITONSERVER_METRIC_KIND_COUNTER does not support Observe"); | |
} | |
case TRITONSERVER_METRIC_KIND_GAUGE: { | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"TRITONSERVER_METRIC_KIND_GAUGE does not support Observe"); | |
} | |
case TRITONSERVER_METRIC_KIND_HISTOGRAM: { | |
auto histogram_ptr = reinterpret_cast<prometheus::Histogram*>(metric_); | |
histogram_ptr->Observe(value); | |
break; | |
} | |
default: | |
return TRITONSERVER_ErrorNew( | |
TRITONSERVER_ERROR_UNSUPPORTED, | |
"Unsupported TRITONSERVER_MetricKind"); | |
} | |
return nullptr; // Success | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to take a closer look, but my gut reaction is that Guan is probably right and we can probably just reuse MetricValue
and MetricSet
which will call Collect and Observe internally when kind == KIND_HISTOGRAM
if functionally equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricValue
may not work if Collect
returns multiple values (one per bucket?), but again will need to take a closer look. Let me know if you already know more details on this from your research.
But similar to the new C API for MetricV2, keep in mind how this would work if we added support for Summary metric and wanted to get the values for each quantile, which is basically same as values for each bucket. Ideally the same API would work for both or all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricValue
cannot be reused for histogram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to revisit this change. The consensus is to keep C API and python_backend API 1:1 matched. I am inclined to add a new C API TRITONSERVER_MetricObserve
for histogram instead of reusing TRITONSERVER_MetricSet
for three reasons.
- Both
Histogram
andSummary
types callObserve
to record new value. We can reuse observe forSummary
type if we add it in the future. - Histogram also has
ObserveMultiple
API which may be added in the future. I don't like the idea thatHistogram.Set
andHistogram.ObserveMultiple
coexist. - Setting histogram to a value aka Histogram.set(val) is semantically wrong. It is confusing to users familiar with Prometheus APIs. The description of
TRITONSERVER_MetricSet
can be verbose as well in order to describe different behaviors for counter/gauge and histogram/summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well.. Triton metrics API is not supposed to be mirroring "Prometheus API", Prometheus is one of the "forms" we can exhibit the metrics as. So we should design the API to be using generic terms for statistics, the meaning of gauge
/counter
/histogram
(and summary
?) is not affected by the fact that Prometheus or other metrics libraries are used.
Thinking from this mindset, my question is if observe
is the generic verb for recording the statistics of histogram
. If that is the case, then I am fine to add XXXObserve
, otherwise, we should use the proper verb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either sample or observe. Voting for Observe for simplicity.
A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.
d0fed63
to
fd5c44b
Compare
79566e7
to
edb0533
Compare
include/triton/core/tritonserver.h
Outdated
/// Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE and returns | ||
/// Set the current value of metric to value or observe the value to metric. | ||
/// Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE and | ||
/// TRITONSERVER_METRIC_KIND_HISTOGRAM. Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to explain what it does when TRITONSERVER_METRIC_KIND_HISTOGRAM
is histogram (i.e. increment the counter for the bucket that value matches)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "observe" mean? Can we add more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I still think we need a new C API TRITONSERVER_MetricObserve
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/metric_family.h
Outdated
buckets_.resize(bucket_count); | ||
std::memcpy(buckets_.data(), buckets, sizeof(double) * bucket_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buckets_.resize(bucket_count); | |
std::memcpy(buckets_.data(), buckets, sizeof(double) * bucket_count); | |
buckets_ = std::vector<double>(buckets, buckets + bucket_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/test/metrics_api_test.cc
Outdated
std::vector<std::uint64_t> cumulative_counts = {1, 1, 2, 2, 3, 3}; | ||
ASSERT_EQ(buckets.size() + 1, cumulative_counts.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cumulative_counts
is depending on the buckets
you split for the histogram, you should initialize cumulative_counts
according to the buckets
and data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
What does the PR do?
Support histogram metric type and add tests.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/vllm_backend#56
triton-inference-server/python_backend#374
triton-inference-server/server#7525
Where should the reviewer start?
n/a
Test plan:
n/a
17487728
Caveats:
n/a
Background
Customer requested histogram metrics from vLLM.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a