The working principle of histogram and the calculation method of quantile 07/11 Update SLTechnology News&Howtos

The working principle of histogram and the calculation method of quantile

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the working principle of histogram and the calculation method of quantile". In the daily operation, I believe that many people have doubts about the working principle of histogram and the calculation method of quantile. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about the working principle of histogram and the calculation method of quantile. Next, please follow the editor to study!

Prometheus provides four types of indicators (reference: Prometheus indicators), of which Histogram and Summary are the most complex and difficult to understand. The purpose of this article is to help you deepen your understanding of this histogram type of indicators.

1. What is Histogram?

According to the previous document, Histogram samples the data over a period of time (usually request duration or response size, etc.) and counts it in a configurable bucket (bucket). However, this sentence is still not easy to understand, which is illustrated by specific examples below.

Suppose we want to monitor the response time of an application over a period of time, and the final monitored sample has a response time range of 0s~10s. Now we divide the range of the sample into different intervals, that is, different bucket, and the width of each bucket is 0.2s. So the first bucket represents the number of requests with a response time less than or equal to 0.2s, the second bucket represents the number of requests with a response time greater than 0.2s and 0.4s, and so on.

The histogram of Prometheus is a cumulative histogram, which is different from the interval division method above. It is divided as follows: also assuming that the width of each bucket is 0.2s, then the first bucket represents the number of requests with response time less than or equal to 0.2s, the second bucket represents the number of requests with response time less than or equal to 0.4s, and so on. In other words, each bucket sample contains all previous bucket samples, so it is called cumulative histogram.

two。 Why cumulative histogram?

The previous section tells us that the histogram in Prometheus is cumulative, which is strange because non-cumulative histograms are generally easier to understand. Why would Prometheus do that?

Imagine that if additional tags were added to histogram-type metrics, or if more bucket were divided, the analysis of sample data would become more and more complex. If the histogram is cumulative, some bucket can be discarded as needed when fetching metrics, which can not only reduce Prometheus maintenance costs, but also roughly calculate the quantile of the sample value. In this way, users can dynamically reduce the number of samples captured without modifying the application code.

Suppose the sample data of a histogram type metric is as follows:

Now we want Prometheus to discard the bucket whose response time is below 100ms when fetching metrics, which can be achieved through the following relabel configuration:

Where example_latency_seconds_bucket is used to match the value of the label _ _ name__, and '0.0.percent' is used to match the value of the label le, that is, the value of le is 0.0x. Then discard the matched sample.

In this way, you can discard any bucket, but you can't discard the bucket of le= "+ Inf" because the histogram_quantile function needs to use this tag.

In addition, histogram also provides _ sum metrics and _ count metrics, even if you discard all bucket, you can still use these two metric values to calculate the average response time of the request.

By accumulating histograms, it is also easy to calculate the number of samples of a bucket as a proportion of all samples. For example, if you want to know the percentage of all requests with a response time of less than or equal to 1s, you can calculate it by the following formula:

Example_latency_seconds_bucket {le= "1. 0} / ignoring (le) example_latency_seconds_bucket {le=" + Inf "} 3. Quantile calculation

Prometheus calculates the quantile (quantile) through the histogram_quantile function, and it is an estimated value, which is not entirely accurate, because this function assumes that the sample distribution in each interval is linear to calculate the resulting value. The accuracy of prediction depends on the granularity of bucket interval division. The larger the granularity is, the lower the accuracy is. The following figure is an example:

Suppose there are 10000 samples, and the 9501 sample falls into the eighth bucket. The eighth bucket has a total of 368 samples, of which the 9501 sample belongs to the 93rd sample in the bucket.

According to the formula on line 108 of the Prometheus source code file promql/quantile.go:

Return bucketStart + (bucketEnd-bucketStart) * float64 (rank/count)

We can calculate the sample value of (quantile=0.95) as follows:

This value is very close to the exact quantile. For more information on how to use the histogram_quantile function, please refer to the PromQL built-in function.

At this point, the study of "the working principle of histogram and the method of calculating quantiles" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.