How to use the prometheus-metrics type 07/15 Update SLTechnology News&Howtos

How to use the prometheus-metrics type

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "the use of prometheus-metrics type". In the daily operation, I believe that many people have doubts about the use of prometheus-metrics type. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about the use of prometheus-metrics type. Next, please follow the editor to study!

In terms of storage, all the monitoring metrics metric are the same, but there are some slight differences in these metric in different scenarios. For example, the indicator node_load1 in the sample returned by Node Exporter reflects the load status of the current system, and the sample data returned by this indicator is constantly changing over time. The sample data obtained by the indicator node_cpu is different, it is a continuously increasing value, because it reflects the cumulative usage time of CPU, theoretically, as long as the system does not shut down, this value will increase infinitely.

In order to help users understand and distinguish the differences between these different monitoring indicators, Prometheus defines four different types of indicators (metric type): Counter (counter), Gauge (dashboard), Histogram (histogram), Summary (summary).

The type of the sample is also included in its comments in the sample data returned by Exporter. For example:

# HELP node_cpu Seconds the cpus spent in each mode.

# TYPE node_cpu counter

Node_cpu {cpu= "cpu0", mode= "idle"} 362812.7890625

Counter: counters that only increase but not decrease

Counter-type metrics work in the same way as counters, increasing rather than decreasing (unless the system is reset). Common monitoring metrics, such as http_requests_total,node_cpu, are Counter-type monitoring indicators. Generally, it is recommended to use _ total as the suffix when defining the name of the Counter type metric.

Counter is a simple but powerful tool, for example, we can record the number of times certain events occur in an application, and by storing this data in time series, we can easily understand the change in the rate at which the event occurs. PromQL's built-in aggregation operations and functions allow users to analyze this data further:

For example, get the growth rate of HTTP requests through the rate () function:

Rate (http_requests_ Total [5m])

Query the HTTP addresses of the top 10 visitors in the current system:

Topk (10, http_requests_total)

Gauge: dashboards that can be added or reduced

Unlike Counter, Gauge-type metrics focus on reflecting the current state of the system. Therefore, the sample data of such indicators can be increased or decreased. Common metrics such as node_memory_MemFree (current idle content size of the host) and node_memory_MemAvailable (available memory size) are all monitoring metrics of Gauge type.

Through Gauge metrics, users can directly view the current status of the system:

Node_memory_MemFree

For Gauge-type monitoring metrics, the change of the sample within a period of time can be obtained through the built-in function delta () of PromQL. For example, calculate the difference in CPU temperature within two hours:

Delta (cpu_temp_celsius {host= "zeus"} [2h])

You can also use deriv () to calculate the linear regression model of the sample, or even directly use predict_linear () to predict the changing trend of the data. For example, predict the remaining disk space of the system after 4 hours:

Predict_linear (node_filesystem_free {job= "node"} [1h], 4 * 3600)

Using Histogram and Summary to analyze data distribution

In addition to monitoring metrics of Counter and Gauge types, Prometheus also defines metric types of Histogram and Summary. Histogram and Summary are mainly used to count and analyze the distribution of samples.

In most cases, people tend to use the average of certain quantitative metrics, such as the average usage of CPU and the average response time of the page. The problem with this approach is obvious, taking the average response time of system API calls as an example: if most API requests remain within the response time range of 100ms, while the response time of individual requests takes 5 seconds, it will cause the response time of some WEB pages to fall to the median, and this phenomenon is called the long tail problem.

To distinguish between average slow and long-tailed slow, the easiest way is to group according to the range of request delays. For example, count the number of requests delayed between 0~10ms and the number of requests between 10~20ms. In this way, the reason why the system is slow can be analyzed quickly. Both Histogram and Summary are designed to solve this problem. Through the monitoring indicators of Histogram and Summary, we can quickly understand the distribution of monitoring samples.

For example, the metric type for the metric prometheus_tsdb_wal_fsync_duration_seconds is Summary. It records the processing time of wal_fsync processing in Prometheus Server. By accessing the / metrics address of Prometheus Server, the following monitoring sample data can be obtained:

# HELP prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.

# TYPE prometheus_tsdb_wal_fsync_duration_seconds summary

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.5"} 0.012352463

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.9"} 0.014458005

Prometheus_tsdb_wal_fsync_duration_seconds {quantile= "0.99"} 0.017316173

Prometheus_tsdb_wal_fsync_duration_seconds_sum 2.888716127000002

Prometheus_tsdb_wal_fsync_duration_seconds_count 216

From the above sample, we can know that the total number of wal_fsync operations performed by Prometheus Server is 216 times, and the time consuming is 2.888716127000002s. The time consuming of median (quantile=0.5) is 0.012352463 (quantile=0.9) is 0.014458005s.

In the sample data returned by Prometheus Server itself, we can also find the monitoring indicator prometheus_tsdb_compaction_chunk_range_bucket of type Histogram.

# HELP prometheus_tsdb_compaction_chunk_range Final time range of chunks on their first compaction

# TYPE prometheus_tsdb_compaction_chunk_range histogram

Prometheus_tsdb_compaction_chunk_range_bucket {le= "100s"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "400"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "1600"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "6400"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "25600"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "102400"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "409600"} 0

Prometheus_tsdb_compaction_chunk_range_bucket {le= "1.6384e+06"} 260

Prometheus_tsdb_compaction_chunk_range_bucket {le= "6.5536e+06"} 780

Prometheus_tsdb_compaction_chunk_range_bucket {le= "2.62144e+07"} 780

Prometheus_tsdb_compaction_chunk_range_bucket {le= "+ Inf"} 780

Prometheus_tsdb_compaction_chunk_range_sum 1.1540798e+09

Prometheus_tsdb_compaction_chunk_range_count 780

Similar to metrics of type Summary, samples of type Histogram also reflect the total number of records of the current metric (with the suffix of _ count) and the total amount of their values (with the suffix of _ sum). The difference is that the Histogram index directly reflects the number of samples in different intervals, and the interval is defined by the label len.

At the same time, for the index of Histogram, we can also calculate the quantile of its value through the histogram_quantile () function. The difference is that Histogram calculates the quantile on the server side through the histogram_quantile function. The quantile of Sumamry is calculated directly on the client side. Therefore, for quantile calculation, Summary has better performance when querying through PromQL, while Histogram consumes more resources. On the other hand, Histogram consumes less resources for the client. When choosing these two ways, users should choose according to their own actual scenarios.

Https://blog.csdn.net/polo2044/article/details/83277299

At this point, the study on "the use of prometheus-metrics types" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.