The past Life and present Life of the Open Source Monitoring system Prometheus 07/19 Update SLTechnology News&Howtos

The past Life and present Life of the Open Source Monitoring system Prometheus

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Prometheus is an open source monitoring system of SoundCloud, and it is also the second project to join CNCF after Kubernetes. Prometheus is an excellent monitoring system. Walker has developed a number of components around Prometheus, including basic alarm components, service discovery components, various collection Exporters and so on. These components, combined with Prometheus, support most of Walker's monitoring business. This article mainly introduces Prometheus, from its source, architecture and a specific example to illustrate, as well as what Walker has done around Prometheus.

Origin

The previous application architecture of SoundCloud was the boulder architecture, in which all the functions were placed in a large module, and there was no clear boundary between the functions. There are two main problems in the application of Stonehenge architecture: on the one hand, it is difficult to expand horizontally, only vertically, but the capacity of a single machine is limited after all; on the other hand, various functions are coupled together. A new function needs to be developed on the existing technology stack, and to ensure that it will not affect the existing functions. So they turned to the micro-service architecture, splitting the original functions into hundreds of separate services, and the whole system ran thousands of instances. Migrating to a micro-service architecture brings some challenges to monitoring, and now you need to know not only the operation of a component, but also the overall operation of the service. Their monitoring scheme at that time was: StatsD + Graphite + Nagios,StatsD combined with Graphite to build monitoring charts, and each service pushed the sample data to StatsD,StatsD to aggregate the pushed sample data together, regularly pushed to Graphite,Graphite to save the sample data in the time series database, users built monitoring charts according to the API provided by Graphite and combined with their own monitoring needs, and analyzed the indicators of the service through charts (for example, delay Requests per second, errors per second, etc.).

So can such a solution meet the monitoring requirements of the micro-service architecture? What are the requirements: can not only know the overall operation of the service, but also maintain sufficient granularity to know the operation of a component. The answer is very difficult. Why? For example, we want to count the number of errors in the response of the api-server service to POST / tracks requests. The metric is named api-server.tracks.post.500, which can be measured by the http status code. The structure of the Graphite metric name is a hierarchical structure. The api-server specifies the name of the service, the handler,post of the service specifies the method of the request, and the handler,post of the service specifies the status code of the request response. The api-server service instance pushes the metric to StatsD,StatsD aggregates the metrics pushed by each instance, and then regularly pushes it to Graphite. If we query the api-server.tracks.post.500 metric, we can get the number of responses to service errors, but if our api-server service runs multiple instances and wants to know the number of responses to an instance error, how do we query it? The problem is that using such an architecture, the metrics sent by various service instances are often aggregated together, and the information of the instance dimension is lost, so it is impossible to count the metric information of a specific instance.

The combination of StatsD and Graphite is used to build the monitoring chart, and the alarm is done by another system-Nagios-, which runs a detection script to determine whether the host or service is running normally, and sends an alarm if it is abnormal. The biggest problem with Nagios is that the alarm is oriented to the CVM, and the check item of each alarm revolves around the CVM. In a distributed system environment, the down of the host is normal, and the design of the service itself can tolerate the loss of node down. However, in this scenario, Nagios will still trigger the alarm.

If you have read this https://landing.google.com/sre before. Arker introduces Google Borgmon's article, compared with Prometheus, you will find that the two systems are very similar. In fact, Prometheus was deeply influenced by the Borgmon system, and the employees who were involved in the construction of the Google monitoring system joined SoundCloud. In short, the combination of various factors promoted the birth of the Prometheus system.

The solution of Prometheus

So how does Prometheus solve these problems? In the previous scheme, the construction of alarm and chart depends on two different systems. Prometheus adopts a new model, which takes the collection of time series data as the core of the whole system. Whether it is alarm or building monitoring chart, it is realized by manipulating time series data. Prometheus identifies time series data by the name of metrics and the combination of label (key/value). Each label represents a dimension, and you can add or decrease label to control the selected time series data. As mentioned earlier, the monitoring requirements under the micro-service architecture: it can not only know the overall operation of the service, but also maintain sufficient granularity to know the operation of a component. This goal can be easily achieved with the help of this multi-dimensional data model. Let's take the previous example of statistical http error response as an example. Let's assume that there are three running instances of api_server service, and Prometheus collects sample data in the following format (where intance label is automatically added by Prometheus):

Api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500", instance=" sample1 "}-> 34api_server_http_requests_total {method=" POST ", handler=" / tracks ", status=" 500", instance= "sample2"}-> 28api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500", instance=" sample3 "}-> 31

If we only care about the number of errors in a specific instance, we only need to add instance label. For example, if we want to view the number of incorrect requests with instance name sample1, then I can use the expression api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500", instance= "sample1"} to select time series data. The selected data is as follows:

Api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500", instance= "sample1"}-> 34

If we care about the number of errors in the entire service, just ignore the instance label removal and aggregate the results together, for example

Sum without (instance) (api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500"}) calculates the timing data as follows:

Api_server_http_requests_total {method= "POST", handler= "/ tracks", status= "500"}-> 93

Alarms are achieved by manipulating timing data rather than running a custom script, so alerts can be made as long as the metric data exposed by the service or host can be collected.

Architecture

Let's briefly analyze the architecture of Prometheus, take a look at the functions of the various components, and how they interact.

Prometheus Server is the core of the whole system. It periodically pulls indicators from the API exposed by the monitoring target (Exporters), and then saves the data to the time series database. If the monitoring target is dynamic, these monitoring targets can be dynamically added with the help of the service discovery mechanism. In addition, it also exposes API that executes PromQL (the language used to manipulate time series data) and other components, such as Prometheus Web. Grafana can query the corresponding timing data through this API. Prometheus Server executes the alarm rule periodically. The alarm rule is an PromQL expression, and the value of the expression is true or false. If it is true, the resulting alarm data is pushed to alertmanger. The aggregation, grouping, sending, disabling, resuming and other functions of alarm notifications are not done by Prometheus Server, but by Alertmanager. Prometheus Server only pushes the triggered alarm data to Alertmanager, and then Alertmanger aggregates the alarms together according to configuration and sends them to the corresponding recipients.

If we want to monitor scheduled tasks, want the execution time of instrument tasks, and whether the task executed successfully or failed, how do we expose these metrics to Prometheus Server? For example, if we do a database backup every other day, we want to know how long each backup takes and whether the backup is successful. Our backup task will only be executed for a period of time. If the backup task ends, how can Prometheus Server pull the backup metric data? To solve this problem, you can do it through the pushgateway component of Prometheus. Each backup task pushes metrics to pushgateway component, pushgateway caches the pushed metrics, and Prometheus Server pulls metrics from Pushgateway.

Examples

Prometheus is introduced from a larger level-background, architecture -. Now, let's take a look at how to use Prometheus to build monitoring charts, analyze system performance and alarm.

We have a service that exposes four API, and each API only returns some simple text data. Now, we want to monitor this service to view and analyze the request rate, the aPCge delay and the delay distribution of the service, and trigger the alarm when the delay of the application is too high or inaccessible. The code example is as follows:

Package mainimport ("math/rand"net/http"time"github.com/prometheus/client_golang/prometheus"github.com/prometheus/client_golang/prometheus/promauto"github.com/prometheus/client_golang/prometheus/promhttp") var (Latency = promauto.NewHistogramVec (prometheus.HistogramOpts {Help: "latency of sample app", Name: "sample_app_latency_milliseconds", Buckets: prometheus.ExponentialBuckets (10,2,9),}, [] string {"handler" "method"}) func instrumentationFilter (f http.HandlerFunc) http.HandlerFunc {return func (writer http.ResponseWriter, request * http.Request) {now: = time.Now () f (writer, request) duration: = time.Now (). Sub (now) Latency.With (prometheus.Labels {"handler": request.URL.Path, "method": request.Method}). Observe (float64 (duration.Nanoseconds ()) / 1e6)}} / / jitterLatencyFilter make request latency between d and d*maxFactorfunc jitterLatencyFilter (d time.Duration, maxFactor float64, f http.HandlerFunc) http.HandlerFunc {return func (writer http.ResponseWriter, request * http.Request) {time.Sleep (d + time.Duration (rand.Float64 () * maxFactor*float64 (d)) f (writer, request)} func main () {rand.Seed (time.Now (). UnixNano ()) http.Handle ("/ metrics") Promhttp.Handler () http.Handle ("/ a", instrumentationFilter (jitterLatencyFilter (10*time.Millisecond, 256, func (w http.ResponseWriter, r * http.Request) {w.Write ([] byte ("success")})) http.Handle ("/ b", instrumentationFilter (jitterLatencyFilter (10*time.Millisecond, 128, func (w http.ResponseWriter, r * http.Request) {w.Write ([] byte ("success")})) http.Handle ("/ c") InstrumentationFilter (jitterLatencyFilter (10*time.Millisecond, 64, func (w http.ResponseWriter, r * http.Request) {w.Write ([] byte ("success")})) http.Handle ("/ d", instrumentationFilter (jitterLatencyFilter (10*time.Millisecond, 32, func (w http.ResponseWriter, r * http.Request) {w.Write ([] byte ("success")})) http.ListenAndServe (": 5001", nil)}

We build the monitoring system according to the processes such as instrumentation, exposition, collection and query. Instrumentation focuses on how to measure the indicators of the application and which indicators need to be measured; exposition focuses on how to expose the indicators through http protocol; collection focuses on how to collect indicators; and query focuses on how to construct PromQL expressions for querying time series data. First of all, from instrumentation, there are four indicators that we are concerned about:

Average delay distribution access status of requests var (Latency = promauto.NewHistogramVec (prometheus.HistogramOpts {Help: "latency of sample app", Name: "sample_app_latency_milliseconds", Buckets: prometheus.ExponentialBuckets (10,2,9), [] string {"handler", "method"}))

First register the indicators, and then track and record the values of the indicators. Using the golang client library provided by Prometheus, you can easily track and record the values of metrics. We put instrumentation code into the application code, and the values of the corresponding metrics status will be recorded every time we request.

Client golang provides four types of metrics, namely Counter, Gauge, Histogram and Summary,Counter, which are used to measure values that will only increase, such as the number of requests for services. Metrics of Gauge type are used to measure status values, which can be larger or smaller, such as the delay time of requests. Histogram and Summary indicators are similar, these two indicators sample observed values, record the distribution of values, the number of statistical observations, cumulative observed values, can be used to statistics the distribution of sample data. In order to collect request rate, average delay and delay distribution index, it is convenient to use Histogram type index to track and record each request. The difference between Histogram type index and ordinary type (Counter, Gauge) is that it will generate multiple sample data, one is to observe the total number of samples, one is to observe the cumulative value of sample values, and the other is a series of sample data recording sample percentile. The access status can be expressed by up metrics. Prometheus will record the collected health status in up metrics each time it is collected.

Http.Handle ("/ metrics", promhttp.Handler ())

After the instrumentation is complete, the next step is exposition. Just add the Prometheus http handler and the metrics can be exposed. The sample data returned by accessing this Handler is as follows (some extraneous sample data is omitted):

Sample_app_latency_milliseconds_bucket {handler= "/ d" method= "GET", le= "10"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "20"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "40"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "80"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET" Le= "160"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "320"} 0sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "640"} 1sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "1280"} 1sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET", le= "2560"} 1sample_app_latency_milliseconds_bucket {handler= "/ d", method= "GET" Le= "+ Inf"} 1sample_app_latency_milliseconds_sum {handler= "/ d", method= "GET"} 326.308075sample_app_latency_milliseconds_count {handler= "/ d", method= "GET"} 1

Just exposing the metrics does not allow prometheus server to collect metrics. We need to proceed to the third step of collection, configure prometheus server to discover our service, and then collect sample data exposed by the service. Let's simply look at the configuration of prometheus server, where global specifies the global configuration during collection, scrape_interval specifies the collection interval, evaluation_interval specifies alerting rule (alerting rule is a PromQL expression with a Boolean value, and if it is true, the relevant alarm notification is pushed to Alertmanager), that is, the evaluation interval of alarm rules, scrape_timeout specifies the timeout during collection, and alerting specifies the address of the Alertmanager service. Scrape_configs specifies how to discover monitoring objects, where job_name specifies which category the discovered service belongs to, and static_configs specifies the static address of the service. As mentioned earlier, Prometheus supports dynamic service discovery, such as file and kubernetes service discovery mechanism. Here we use the simplest static service discovery mechanism.

# my global configglobal: scrape_interval: 2s # Set the scrape interval to every 15 seconds. Default is every 1 minute. Evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). Rule_files:- rule.yaml# Alertmanager configurationalerting: alertmanagers:-static_configs:-targets:-localhost:9093scrape_configs:- job_name: sample-app scrape_interval: 3s static_configs:-targets:-sample:5001

After collecting the indicators, we can use the PromQL language provided by Prometheus to manipulate the collected time series data. For example, if we want to count the average rate of requests, we can use this expression.

Irate (sample_app_latency_milliseconds_ sum [1m]) / irate (sample_app_latency_milliseconds_ count [1m]) to calculate.

After you have the time series data, you can use Grafana to build the monitoring chart. How to configure the Grafana chart will not be expanded here. The core point is to use PromQL expressions to select and calculate the time series data.

Prometheus alarms are implemented by evaluating Alerting Rule. Alerting rule is a series of PromQL expressions, and alerting rule is saved in the configuration file. We want to alarm the delay and availability of the application, and trigger the alarm when the application is too high or inaccessible. The rules can be defined as follows:

-name: sample-up rules:-alert: UP expr: up {instance= "sample:5001"} = 0 for: 1m labels: severity: page annotations: summary: Service health-alert: 95th-latency expr: histogram_quantile (0.95, rate (sample_app_latency_milliseconds_ bucket [1m]) > 1000 for: 1m labels: severity: page annotations: summary: 95th service latency

Where UP specifies the available status of the service, and 95th-latency specifies that 95% of the requests will trigger an alarm if the request is longer than 1000 milliseconds. Prometheus evaluates these rules regularly. If the conditions are met, the alarm notification will be sent to Alertmanger,Alertmanger. According to its own routing configuration, the alarm will be aggregated and distributed to the specified recipient. We want to receive the alarm through the mailbox, which can be configured as follows:

Global: smtp_smarthost: smtp_auth_username: smtp_from: smtp_auth_password: smtp_require_tls: false resolve_timeout: 5mroute: receiver: mereceivers:- name: me email_configs:-to: example@domain.comtemplates:-'* .tmpl'

In this way, we can receive the alarm email through the mailbox.

Related work

Whether it is monitoring chart-related business, or alarm-related business, are inseparable from the collection of related indicators. Walker is a database product company, we spend a lot of energy to collect database-related indicators, from Oracle to MySQL, and then to SQL Server, the mainstream relational database indicators are collected. For some general indicators, such as operating system-related indicators, we mainly use the open source Exporters to collect. Walker's products are delivered in both software and hardware, and there are a large number of hardware-related indicators to be collected, so we also have Expoters that specializes in collecting hardware indicators.

In most scenarios, the services to be monitored are dynamic. For example, users apply for a database from the platform, need to add relevant monitoring services, users delete database resources, need to remove related monitoring services, the database service to be monitored is in a dynamic change. The infrastructure of each product line is different. Some database services run on Oracle RAC, some run on ZStack, and some run on Kubernetes. For applications running on Kubernetes, and need to worry about how Prometheus discovers the services to be monitored, you only need to configure the relevant service discovery mechanism. For other types, we mainly rely on Prometheus's file_sd service discovery mechanism. File-based service discovery mechanism is the most general mechanism. We write the monitored objects into a file. Prometheus listens for changes in this file and dynamically maintains the monitored objects. We build a special component on the basis of file_sd to be responsible for the dynamic update of the service. Other applications call the API exposed by this component to maintain the objects they want to monitor.

Prometheus's own mechanism can not meet the alarm requirements of our business. On the one hand, we need to count the alarm notification, but Alertmanager itself does not persist the alarm notification, and the alarm notification is lost after the service is restarted; on the other hand, users use the Web page to configure relevant alarms, alarm rules and routing of alarm notifications need to be generated dynamically according to user configuration. In order to solve these two problems, we make the relevant business functions into basic alarm components for each product line to use. In order to solve the problem that Alertmanager cannot persist alarm notifications, the basic alarm component uses the mechanism of Alertmanager webhook to receive alarm notifications and then save the notifications to the database. In addition, the alarm configuration of users needs to be generated dynamically, so we define a new model to describe the alarm model in our business.

Summary

Promtheus takes the collection of time series data as the core of the whole system, whether it is to build monitoring charts or alarms, it is accomplished by manipulating time series data. With the help of multi-dimensional data model and powerful query language, Prometheus meets the monitoring requirements under the micro-service architecture: it can not only know the overall operation of the service, but also maintain sufficient granularity to know the operation of a component. Walker stood on the shoulder of the giant and built his own monitoring system around Prometheus, from Exporters to service discovery to basic alarm components, which combined with Prometheus constitute the core of Walker monitoring system.

Author: Guo Zhen, Walk Technology Development engineer, has many years of development experience in Python, Golang and other languages, is familiar with cloud native applications such as Kubernetes and Prometheus, and is responsible for the research and development of QFusion RDS platform and basic alarm platform.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.