Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ffresty] [metric] HTTP Response Time and Complete Gauge Support #160

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

onelapahead
Copy link
Contributor

@onelapahead onelapahead commented Dec 12, 2024

For Resty, we enhance the existing before/after hooks so that we record the elapsed time on responses. And use more labels and smarter context's for consistently deriving any request metadata like the host.

Then, within the metric manager itself, we didn't offer the ability to use gauge inc/dec which can be very useful rather than using set.

@onelapahead onelapahead changed the title [ffresty] [metric] Richer HTTP Client Metrics and Complete Gauge Support [ffresty] [metric] HTTP Response Time and Complete Gauge Support Dec 12, 2024
pkg/ffresty/ffresty.go Outdated Show resolved Hide resolved
firefly-common.iml Outdated Show resolved Hide resolved
@onelapahead onelapahead marked this pull request as ready for review December 16, 2024 12:53
metricsManager.NewCounterMetricWithLabels(ctx, "network_error", "Network error", []string{"host", "method"}, false)
metricsManager.NewCounterMetricWithLabels(ctx, metricsHTTPResponsesTotal, "HTTP response", []string{"status", "error", "host", "method"}, false)
metricsManager.NewCounterMetricWithLabels(ctx, metricsNetworkErrorsTotal, "Network error", []string{"host", "method"}, false)
metricsManager.NewSummaryMetricWithLabels(ctx, metricsHTTPResponseTime, "HTTP response time", []string{"status", "host", "method"}, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onelapahead what's your consideration between Histogram and Summary for this timing metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram is more costly in terms of cardinality bc of all the buckets you make. So this felt light enough to only calculate average response time across a few basic dimensions.

Would love to toggle btwn histogram and summaries based on certain settings eventually, but that was a bigger change than I had an appetite for.

const (
metricsHTTPResponsesTotal = "http_responses_total"
metricsNetworkErrorsTotal = "network_errors_total"
metricsHTTPResponseTime = "http_response_time_seconds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some size metrics for data transmitted would be useful to add as well.

Copy link
Contributor

@Chengxuan Chengxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great step forward, thanks @onelapahead .

I asked a question about the metrics type for the response time. Switching to histogram will have the following benefits, which could be compelling:

  1. reduce observation costs on the client
  2. enable aggregation across instances

^^ from https://prometheus.io/docs/practices/histograms/

@onelapahead
Copy link
Contributor Author

enable aggregation across instances

To clarify - this is still entirely possible with summaries. They are similar to a histogram but less counters:

# summary example
http_response_time_sum{}
http_response_time_count{}

# histogram example
http_response_time_buckets{le="1.0"}
http_response_time_buckets{le="0.5"}
http_response_time_buckets{le="0.1"}
http_response_time_sum{}
http_response_time_count{}

Histograms are great/preferred for aggregate percentiles and therefore performance engineering. Summaries are great for aggregated averages and therefore reliability engineering.

Quantiles are a special version of summary, which to your point are basically worthless at any real scale bc they cannot be aggregated. However, they are less expensive cardinality than histograms. So if you care about individual "thing's" performance they are useful.

Do hope ff-common can offer histograms as an option, but not by default since they are quite expensive. But will save that for a future contributor.

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Copy link
Contributor

@EnriqueL8 EnriqueL8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @onelapahead

Getting Error: pkg/ffresty/ffresty.go:201:1: cyclomatic complexity 37 of func NewWithConfig is high (> 30) (gocyclo) from build, given this function is doing a lot of just configuration we can probably ignore this complexity warning by adding the ignore comment. Or if you are up for it we can break it down.

We could split adding the onBeforeRequest to another function but it might be messy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants