What do prometheus's summary and histogram indicators mean? 04/19 Update SLTechnology News&Howtos

What do prometheus's summary and histogram indicators mean?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the meaning of prometheus's summary and histogram indicators". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what is the meaning of prometheus's summary and histogram indicators?"

Client and server of prometheus

The client is the end that provides monitoring metrics data (such as the written exporter). Prometheus provides client libraries in various languages, and it is necessary to put the monitored code in the monitored service code through the Prometheus client library. When Prometheus acquires the client's HTTP endpoint, the client library sends all tracked metrics data to the server. For more information, please see the customer base.

Server refers to prometheus server, which pulls, stores and queries all kinds of index data.

Histogram

Histogram is a bar chart, and there are three functions in the query language in Prometheus systems:

Count each sampling point (not over a period of time) and put it into each bucket

Cumulative sum of values for each sample point (sum)

Cumulative sum of times for sampling points (count)

Metric name: the bar chart of [basename], and the function metric names of the above three categories.

[basename] _ bucket {le= "upper boundary"}, this value is less than or equal to the number of all sampling points on the upper boundary

[basename] _ sum

[basename] _ count

Histogram example

As shown in the above table, set bucket= [1Magne5Power10]. When the actual sampling data is as shown in the sampling points, Observe indicates the number of sampling points falling in the bucket, that is, the number of samples falling in [-, 1] is 2, that is, the number of samples falling in [1Jing 5] is 3, that is, the number of samples falling in [5jing10] is 1Jing write is the final result (the final result bucket count of histogram is included downward):

[basename] _ bucket {le= "1"} = 2

[basename] _ bucket {le= "5"} = 3

[basename] _ bucket {le= "10"} = 6

[basename] _ bucket {le= "+ Inf"} = 6

[basename] _ count = 6

[basename] _ sum = 18.8378745

Histogram does not save the value of data sampling points. Each bucket only has a counter (float64) to record the number of samples, that is, histogram stores interval sample number statistics, so the client performance overhead does not change significantly compared with Counter and Gauge, so it is suitable for high concurrency data collection.

The histogram_quantile () function gets the summary score on the server side.

Histogram often uses histogram_quantile to perform data analysis. The histogram_quantile function approximates the UpperBound of the sampled data distribution through the piecewise linear approximation model (figure below). The error is relatively large, in which the red curve is the actual sampling distribution (normal distribution), while the bucket score of the solid dot is Histogram is calculated as 0.01 0.25 0.50 0.75 0.95 respectively, which is calculated on the basis of bucket and sum. When solving the sampling value of 0.9 quantile, the linear approximation is made by (0.75,0.95) two adjacent bucket.

However, if you know the distribution of the data, setting the appropriate bucket will also get a relatively accurate score.

Summary

Because histogram is a simple bucket and bucket count on the client side, and percentile estimation is based on such limited data on the prometheus server, it is not very accurate. Summary is to solve the problem of accurate percentile. Summary stores quantile data directly, rather than calculating based on statistical intervals.

The score of Prometheus is called quantile, but it is actually more accurate to call it percentile. Percentile refers to a certain percentage of sampling points less than a certain value.

Summary is the sampling point quantile map statistics. It also has three functions:

The client counts each sampling point over a period of time (the default is 10 minutes) and forms a quantile map. (such as: normal distribution, the proportion of students who fail less than 60 points, the proportion of students who fail less than 80 points, the proportion of students who score less than 95 points)

Statistics of the total scores of all the students in the class (sum)

Count the total number of exams in the class (count)

The summary of [basename] with metrics is displayed in crawling time series data.

The φ-quantiles (0 ≤ φ ≤ 1) of the observation time is shown as [basename] {quantile = "[φ]"}

[basename] _ sum, which refers to the sum of all observations

[basename] _ count, which refers to the observed event count value

The calculation of quantile by summary relies on the third-party library perk:

Github.com/beorn7/perks/quantile

Summary example

Set quantile= {0.5: 0.05,0.9: 0.01,0.99: 0.001}

# HELP prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.

# TYPE prometheus_tsdb_wal_fsync_duration_seconds summary

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.5"} 0.012352463

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.9"} 0.014458005

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.99"} 0.017316173

Prometheus_tsdb_wal_fsync_duration_seconds_sum 2.888716127000002

Prometheus_tsdb_wal_fsync_duration_seconds_count 216

one

two

three

four

five

six

seven

From the above sample, we can know that the total number of wal_fsync operations performed by Prometheus Server is 216 times, and the time consuming is 2.888716127000002s. The median time consuming (quantile=0.5) is 0.012352463 quartile (quantile=0.9). The time consuming is 0.014458005s. 90% of the data are less than or equal to 0.014458005s.

There is a number after each quantile setting, and after the 0.5-quantile is 0. 05 and 0. 99 is followed by 0. 01 and 0. 99 is followed by 0.001. These are the tolerable errors we have set. 0.5-quantile: 0. 05 means to allow the final error not to exceed 0. 05. Suppose the value of a 0.5-quantile is 120. because the setting error is 0.05, 120 represents a value in the range of (0.45,0.55). Note that the quantile error is very small, but the actual score may have a large error.

Selection of summary and histogram when viewing quantiles

Be aware of a few restrictions:

Summary structure has frequent global locking operations, which has a certain impact on the performance of highly concurrent programs. Histogram only does a count of atomic variables for each bucket, while summary needs concurrency protection to calculate the latest X quantile value each time the algorithm is executed. It takes up cpu and memory of the client.

You cannot perform aggregation operations on quantile values generated by Summary (for example, sum, avg, etc.). For example, there are two instances running at the same time, both providing services and counting their respective response times. Finally, the calculated values of 0.5-quantile are 60 and 80 respectively, so it would be wrong to simply calculate the average (60-80) / 2 and think it is the 0.5-quantile value of the population.

The percentile of summary is specified in advance in the client, and the unspecified score cannot be obtained when the server observes the metric data. On the other hand, histogram can be randomly specified through promql, although the calculation is not as accurate as summary, but it brings flexibility.

Histogram can not be accurately divided into numbers, if the setting of bucket is not reasonable, the error will be very large. It will consume the computing resources of the server.

Two pieces of experience

If you need aggregate, select histograms.

If you are clear about the range and distribution of the indicators to be observed, choose histograms. Select summary if you need an exact score.

Referenc

Prometheus principle and Source Code Analysis

Metrics type

HISTOGRAMS AND SUMMARIES

At this point, I believe you have a deeper understanding of "what is the meaning of prometheus's summary and histogram indicators". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.