Skip to content
rbirkby edited this page Nov 5, 2014 · 3 revisions

A Histogram measures the distribution of values in a stream of data. From the Java library documentation

Histogram metrics allow you to measure not just easy things like the min, mean, max, and standard deviation of values, but also quantiles like the median or 95th percentile.

Traditionally, the way the median (or any other quantile) is calculated is to take the entire data set, sort it, and take the value in the middle (or 1% from the end, for the 99th percentile). This works for small data sets, or batch processing systems, but not for high-throughput, low-latency services.

The solution for this is to sample the data as it goes through. By maintaining a small, manageable reservoir which is statistically representative of the data stream as a whole, we can quickly and easily calculate quantiles which are valid approximations of the actual quantiles. This technique is called reservoir sampling.

Histogram sample
    private readonly Histogram histogram = Metric.Histogram("Search Results", Unit.Items);
    public void Search(string keyword)
    {
        var results = ActualSearch(keyword);
        histogram.Update(results.Length);
    }

Out of the box three sampling types are provided:

  • Exponentially Decaying Reservoir - produces quantiles which are representative of (roughly) the last five minutes of data
  • Uniform Reservoir - produces quantiles which are valid for the entirely of the histogram’s lifetime
  • Sliding Window Reservoir - produces quantiles which are representative of the past N measurements

More information about the reservoir types can be found in the Java library documentation

Keeping track of user values

The histogram has the ability to track for which user value a Min, Max or Last Value has been recorded. The user value can be any string value (documentId, operationId, etc).

The Histogram will record for example for which documentId the operation returned the most items:

public class UserValueHistogramSample
{
    private readonly Histogram histogram =
        Metric.Histogram("Results", Unit.Items);

    public void Process(string documentId)
    {
        var results = GetResultsForDocument(documentId);
        this.histogram.Update(results.Length, documentId);
    }
}

After running a few requests, the output of the histogram in text format looks like this:

    Results
             Count = 90 Items
              Last = 46.00 Items
   Last User Value = document-3
               Min = 2.00 Items
    Min User Value = document-7
               Max = 98.00 Items
    Max User Value = document-4
              Mean = 51.52 Items
            StdDev = 30.55 Items
            Median = 50.00 Items
              75% <= 80.00 Items
              95% <= 97.00 Items
              98% <= 98.00 Items
              99% <= 98.00 Items
            99.9% <= 98.00 Items

As you can see the histogram recorded that the min value ( 2 Items ) has been returned for document-7, the max value ( 98 Items ) has been returned for document-4 and the last value ( 46 Items ) for document-3.