The essence of Prometheus and how to implement it 07/12 Update SLTechnology News&Howtos

The essence of Prometheus and how to implement it

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the nature of Prometheus and how to achieve it. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

It mainly introduces the basic knowledge and the internal working mechanism of the most basic indicators.

Rust-prometheus is the Rust client library of the monitoring system Prometheus, implemented by the TiKV team. TiKV uses rust-prometheus to collect various indicators (metric) into Prometheus, so that it can later be displayed as a dashboard monitoring panel using visualization tools such as Grafana. These monitoring metrics play a key role in understanding the current or historical status of TiKV. TiKV provides a wealth of monitoring metrics data, and the code is interspersed with collection fragments of monitoring metrics, so it is necessary to understand rust-prometheus.

Interested partners can also watch our classmates share the technology of rust-prometheus at the FOSDEM 2019 meeting.

Basic knowledge indicator category

Prometheus supports four metrics: Counter, Gauge, Histogram, and Summary. The rust-prometheus library currently implements only the first three. Most of the indicators of TiKV are Counter and Histogram, and a few are Gauge.

Counter

Counter is the simplest and most commonly used index, which is suitable for all kinds of counting and cumulative indicators, requiring monotonous increments. The Counter metric provides a basic inc () or inc_by (x) interface, which represents an increase in count values.

During visualization, such metrics generally show how much has been increased over each time, rather than the value of each time counter. For example, the number of requests received by TiKV is a Counter indicator, which is monitored as a chart of the number of requests received by TiKV at any time (QPS).

Gauge

Gauge is suitable for indicators that fluctuate up and down. Gauge metrics provide inc (), dec (), add (x), sub (x) and set (x) interfaces, all of which are used to update index values.

When this kind of indicator is visualized, it is generally shown its value directly according to time, thus showing how the indicator changes over time. For example, the CPU rate occupied by TiKV is a Gauge indicator, which is directly shown in the monitoring chart of the fluctuation of the CPU rate.

Histogram

Histogram, or histogram, is a relatively complex but also powerful indicator. Histogram can calculate quantiles in addition to basic counting. The Histogram metric provides the observe (x) interface, which means that a value has been observed.

For example, the processing time of TiKV after receiving a request is a Histogram metric. Through Histogram type metrics, 99%, 99.9%, and aPCge request time can be observed on monitoring. Obviously, you can't use a Counter to store the time indicator, otherwise it will only show how long it takes for TiKV to process each hour, not the time spent on a single request. Of course, you may think that you can open another Counter to store the number of requests, so that the cumulative request processing time divided by the number of requests is the average request time at each time.

In fact, this is how Histogram works internally in Prometheus. Histogram metrics actually end up providing a series of time series data:

The cumulative quantity of observations on each barrel (bucket), such as the number of observations on each interval (- ∞, 0.1], (- ∞, 0.2), (- ∞, 0.4], (- ∞, 0.8), (- ∞, 1.6], (- ∞, + ∞).

The cumulative sum of observations.

The number of observations.

Bucket is a simplified way for Prometheus to deal with Histogram observations. Prometheus does not specifically record each observation, but only records the number of observations that fall on each configured bucket interval, which greatly improves the efficiency at the expense of part of the accuracy.

Summary

Summary is similar to Histogram in that it samples observations, but the quantile is calculated on the client side. This type of metrics are not currently implemented in rust-prometheus, so I won't go into further detail here. You can read the introduction in the official Prometheus documentation for more information. Interested students can also refer to the implementation of other languages Client Library to contribute code for rust-prometheus.

Label

Each indicator of Prometheus supports the definition and designation of several sets of tags (Label), and each tag value of the indicator is counted independently, showing the different dimensions of the indicator. For example, for a Histogram metric that counts the time spent on HTTP service requests, you can define and specify tags such as HTTP Method (GET / POST / PUT /...), service URL, client IP, and so on. This can easily satisfy the following types of queries:

Query Method is 99.9% time-consuming (using a single Label) of POST, PUT, and GET, respectively.

Average time spent querying POST / api (using multiple Label combinations)

Ordinary queries such as 99.9% of all requests work properly.

It should be noted that different tag values are an independent time series, so we should avoid too many tag values or the number of tags, otherwise the client will actually pass a large number of indicators to the Prometheus server, affecting efficiency.

Similar to Prometheus Golang client, in rust-prometheus, the indicator with a label is called Metric Vector. For example, the data type for Histogram metrics is Histogram, while the data type for labeled Histogram metrics is HistogramVec. For a HistogramVec, you can get an instance of Histogram after providing the values of each label. Different tag values will get different Histogram instances, and each Histogram instance will be counted independently.

Basic usage

This section focuses on how to use rust-prometheus in a project to collect various metrics. The use is basically divided into three steps:

Define the metrics you want to collect.

The interface provided by the metric is called at a specific location in the code to collect the value of the record indicator.

Implementing HTTP Pull Service allows Prometheus to access collected metrics on a regular basis or to upload collected metrics to Pushgateway on a regular basis using the Push feature provided by rust-prometheus.

Note that the following sample code is based on the latest version of rust-prometheus 0.5 API at the time of release of this article. We are currently designing and implementing version 1.0, which will be further simplified in use, but the following sample code may become obsolete and no longer work after version 1.0 is released, so please refer to the latest documentation.

Define indicator

To simplify use, metrics are generally declared as globally accessible variables so that they can be manipulated freely throughout the code. All metrics provided by rust-prometheus (including Metric Vector) meet the requirements of Send + Sync and can be safely shared globally.

The following sample code defines a global Histogram metric with the help of the lazy_static library, which represents that the HTTP request is time-consuming and has a signature of method:

# [macro_use] extern crate prometheus;lazy_static! {static ref REQUEST_DURATION: HistogramVec = register_histogram_vec! ("http_requests_duration", "Histogram of HTTP request duration in seconds", & ["method"], exponential_buckets (0.005, 2.0,20). Unwrap ()). Unwrap ();} record indicator value

Once you have a globally accessible metric variable, you can record the metric value in your code through the interface it provides. As described in "Basics", the main interface of Histogram is observe (x), which can record an observation. If you want to know the interfaces provided by other Histogram interfaces or other types of metrics, you can refer to the rust-prometheus documentation.

The following example shows how to record indicator values based on the previous code. The code simulates some random values as indicators, pretending to be user-generated. In the actual program, of course, these have to be changed to real data:)

Fn thread_simulate_requests () {let mut rng = rand::thread_rng (); loop {/ / Simulate duration 0s ~ 2s let duration = rng.gen_range (0f64, 2f64); / / Simulate HTTP method let method = ["GET", "POST", "PUT", "DELETE"]. Choose (& mut rng). Unwrap () / / Record metrics REQUEST_DURATION.with_label_values (& [method]) .observe (duration); / / One request per second std::thread::sleep (std::time::Duration::from_secs (1));}} Push / Pull

So far, the code has only recorded the metrics. Finally, we need to enable the Prometheus server to obtain the recorded metric data. There are generally two ways here, Push and Pull.

Pull is the way of obtaining metrics in Prometheus standard. Prometheus Server obtains metrics data by visiting the HTTP interface provided by the application on a regular basis.

Push is another way to obtain metrics based on Prometheus Pushgateway services. Metrics data is actively and periodically pushed to Pushgateway by the application, and then Prometheus obtains it periodically from Pushgateway. This approach is mainly suitable for scenarios where the application is not convenient to start or where the application life cycle is relatively short.

The following sample code implements an interface for Prometheus Server pull metrics data based on the hyper HTTP library. The core is to use TextEncoder provided by rust-prometheus to serialize all metrics data for Prometheus parsing:

Fn metric_service (_ req: Request)-> Response {let encoder = TextEncoder::new (); let mut buffer = vec! []; let mf = prometheus::gather (); encoder.encode (& mf, & mut buffer). Unwrap (); Response::builder () .header (hyper::header::CONTENT_TYPE, encoder.format_type ()) .body (Body::from (buffer)) .unwrap ()}

Students who are interested in using Push can refer to the Push examples provided in the rust-prometheus code, which will not be described in detail because of the limited space here.

The complete code for the above three examples can be found here.

Internal implementation

The following internal implementations are based on the latest version 0.5 code of rust-prometheus when this article was released. The design and implementation of the backbone API of this version is port from Prometheus Golang client, but some modifications have been made to the usage habits of Rust, so the interface is similar to Golang client.

At present, we are developing version 1.0, API design is no longer mainly reference to Golang client, but strive to provide the most user-friendly and concise API for Rust users. For the sake of efficiency, the implementation will also be slightly different from what is explained here, and will remove some of the feature support that has been abandoned and simplify the implementation, so please pay attention to the screening.

Counter / Gauge

Counter and Gauge are very simple metrics, as long as they support thread-safe numerical updates. Readers can simply assume that the core implementation of both Counter and Gauge is Arc. However, since the official metric values of Prometheus need to support floating point numbers, we implement AtomicF64 based on std::sync::atomic::AtomicU64 and CAS operations, which is located in src/atomic64/nightly.rs. The core segment is as follows:

Impl Atomic for AtomicF64 {type T = f64; / / Some functions are omitted. Fn inc_by (& self, delta: Self::T) {loop {let current = self.inner.load (Ordering::Acquire); let new = u64_to_f64 (current) + delta; let swapped = self.inner. Initiate _ and_swap (current, f64_to_u64 (new), Ordering::Release); if swapped = current {return }

In addition, since AtomicU64 is still a nightly feature when version 0.5 is released, in order to support Stable Rust, we also provide AtomicF64's fallback, located in src/atomic64/fallback.rs, based on spinner locks.

Note: the integer_atomics features required by AtomicU64 are recently available in rustc 1.34.0 stabilize. We will also use native atomic operations for Stable Rust after the release of rustc 1.34.0 to improve efficiency.

Histogram

According to the requirements of Prometheus, what Histogram needs to do is to increase the count value for the bucket in which the observation is located after obtaining an observation. In addition, there are total observations, the number of observations need to be accumulated.

Note that Histogram in Prometheus is a cumulative histogram with the meaning of (- ∞, x] for each bucket, so multiple consecutive buckets may have to be updated for each observation. For example, suppose a user defines five bucket boundaries, which are 0.1,0.2,0.4,0.8,1.6, respectively, then the corresponding numerical range for each bucket is (- ∞, 0.1,0.1,0.2], (- ∞, 0.4], (- ∞, 0.8], (- ∞, 1.6), (- ∞, + ∞). Four buckets (- ∞, 0.4], (- ∞, 0.8), (- ∞, 1.6], (- ∞, + ∞) need to be updated for observation value 0.4.

Generally speaking, observe (x) is called frequently, and feedback of collected data to Prometheus is a relatively low-frequency operation, so when implementing buckets with arrays, we do not directly correspond to array elements, but define array elements as non-cumulative buckets. Such as (- ∞, 0.1), [0.1,0.2), [0.2,0.4), [0.4,0.8), [0.8,1.6), [1.6, + ∞), which greatly reduces the amount of data that needs to be updated frequently. Finally, when the data is reported to Prometheus, the array elements are accumulated to get the cumulative histogram, so that the bucket data needed by Prometheus is obtained.

Of course, it can be seen that if a given observation is beyond the range of the barrel, the final recorded maximum is only the upper bound of the barrel, but this is not the actual maximum, so you need to be careful when using it.

For the core implementation of Histogram, please see src/histogram.rs:

Pub struct HistogramCore {/ / Some fields are omitted. Sum: AtomicF64, count: AtomicU64, upper_bounds: Vec, counts: Vec,} impl HistogramCore {/ / Some functions are omitted. Pub fn observe (& self, v: F64) {/ / Try find the bucket. Let mut iter = self .upper _ bounds .iter () .resolerate () .filter (| & (_, f) | v

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.