-
Notifications
You must be signed in to change notification settings - Fork 0
Metrics and Monitoring
Metrics are captured using the HystrixRollingNumber and HystrixRollingPercentile classes in rolling windows. The rolling windows allow low-latency moving windows of metrics to be used for circuit breaker health checks and operations.
You can get direct programmatic access to metrics like this:
HystrixCommandMetrics.getInstances()
HystrixThreadPoolMetrics.getInstances()
The hystrix-metrics-event-stream can be used to power the dashboard, realtime alerting and other such use cases.
Metrics can be published by an implementation of HystrixMetricsPublisher.
Implementations can be registered using [HystrixPlugins.registerMetricsPublisher(HystrixMetricsPublisher impl)](http://netflix.github.com/Hystrix/javadoc/index.html?com/netflix/hystrix/strategy/HystrixPlugins.html#registerMetricsPublisher(com.netflix.hystrix.strategy.metrics.HystrixMetricsPublisher\)).
Implementations included with the project are:
- Netflix Servo: hystrix-servo-metrics-publisher
- Yammer Metrics: hystrix-yammer-metrics-publisher
Following are details of metrics published with these implementations:
Each HystrixCommand publishes metrics with the following tags:
- Servo Tag: "instance" Value: HystrixCommandKey.name()
- Servo Tag: "type" Value: "HystrixCommand"
- Boolean isCircuitBreakerOpen
- Number errorPercentage
- Number executionSemaphorePermitsInUse
- String commandGroup
- Number currentTime
Cumulative Counts (Counter)
The following are cumulative counts since the start of the application.
- Long countCollapsedRequests
- Long countExceptionsThrown
- Long countFailure
- Long countFallbackFailure
- Long countFallbackRejection
- Long countFallbackSuccess
- Long countResponsesFromCache
- Long countSemaphoreRejected
- Long countShortCircuited
- Long countSuccess
- Long countThreadPoolRejected
- Long countTimeout
Rolling Counts (Gauge)
The following are rolling counts as configured by [[metrics.rollingStats.* properties|Configuration]].
These are "point in time" counts representing the last X seconds (for example 10 seconds).
- Number rollingCountCollapsedRequests
- Number rollingCountExceptionsThrown
- Number rollingCountFailure
- Number rollingCountFallbackFailure
- Number rollingCountFallbackRejection
- Number rollingCountFallbackSuccess
- Number rollingCountResponsesFromCache
- Number rollingCountSemaphoreRejected
- Number rollingCountShortCircuited
- Number rollingCountSuccess
- Number rollingCountThreadPoolRejected
- Number rollingCountTimeout
Latency Percentiles: HystrixCommand.run() Execution (Gauge)
Percentiles of execution times for the [HystrixCommand.run()](http://netflix.github.com/Hystrix/javadoc/index.html?com/netflix/hystrix/HystrixCommand.html#run(\)) method (on the child thread if using thread isolation).
These are rolling percentiles as configured by [[metrics.rollingPercentile.* properties|Configuration]].
- Number latencyExecute_mean
- Number latencyExecute_percentile_5
- Number latencyExecute_percentile_25
- Number latencyExecute_percentile_50
- Number latencyExecute_percentile_75
- Number latencyExecute_percentile_90
- Number latencyExecute_percentile_99
- Number latencyExecute_percentile_995
Latency Percentiles: End-to-End Execution (Gauge)
Percentiles of execution times for the end-to-end execution of [HystrixCommand.execute()](http://netflix.github.com/Hystrix/javadoc/index.html?com/netflix/hystrix/HystrixCommand.html#execute(\)) or [HystrixCommand.queue()](http://netflix.github.com/Hystrix/javadoc/index.html?com/netflix/hystrix/HystrixCommand.html#queue(\)) until a response is returned (or ready to return in case of queue().
The purpose of this compared with the latencyExecute* percentiles is to measure the cost of thread queuing/scheduling/execution, semaphores, circuit breaker logic and other aspects of overhead (including metrics capture itself).
These are rolling percentiles as configured by [[metrics.rollingPercentile.* properties|Configuration]].
- Number latencyTotal_mean
- Number latencyTotal_percentile_5
- Number latencyTotal_percentile_25
- Number latencyTotal_percentile_50
- Number latencyTotal_percentile_75
- Number latencyTotal_percentile_90
- Number latencyTotal_percentile_99
- Number latencyTotal_percentile_995
Property Values (Informational)
These informational metrics report the actual property values being used by the HystrixCommand. This is useful to see when a dynamic property takes effect and confirm a property is set as expected.
- Number propertyValue_rollingStatisticalWindowInMilliseconds
- Number propertyValue_circuitBreakerRequestVolumeThreshold
- Number propertyValue_circuitBreakerSleepWindowInMilliseconds
- Number propertyValue_circuitBreakerErrorThresholdPercentage
- Boolean propertyValue_circuitBreakerForceOpen
- Boolean propertyValue_circuitBreakerForceClosed
- Number propertyValue_executionIsolationThreadTimeoutInMilliseconds
- String propertyValue_executionIsolationStrategy
- Boolean propertyValue_metricsRollingPercentileEnabled
- Boolean propertyValue_requestCacheEnabled
- Boolean propertyValue_requestLogEnabled
- Number propertyValue_executionIsolationSemaphoreMaxConcurrentRequests
- Number propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests
Each HystrixThreadPool publishes metrics with the following tags:
- Servo Tag: "instance" Value: HystrixThreadPoolKey.name()
- Servo Tag: "type" Value: "HystrixThreadPool"
- String name
- Number currentTime
Rolling Counts (Gauge)
- Number rollingMaxActiveThreads
- Number rollingCountThreadsExecuted
Cumulative Counts (Counter)
- Long countThreadsExecuted
ThreadPool State (Gauge)
- Number threadActiveCount
- Number completedTaskCount
- Number largestPoolSize
- Number totalTaskCount
- Number queueSize
Property Values (Informational)
- Number propertyValue_corePoolSize
- Number propertyValue_keepAliveTimeInMinutes
- Number propertyValue_queueSizeRejectionThreshold
- Number propertyValue_maxQueueSize
A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Twitter @HystrixOSS | Jobs