How to configure Prometheus file list 07/06 Update SLTechnology News&Howtos

How to configure Prometheus file list

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to configure Prometheus file list". In daily operation, I believe many people have doubts about how to configure Prometheus file list. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "how to configure Prometheus file list". Next, please follow the editor to study!

Prometheus is the second open source project graduated from CNCF after Kubernetes (K8s), which comes from Google's Borgmon. Starting from the matter of "monitoring", this paper goes deep into the core design points of Prometheus, such as architecture principle, goal discovery, index model, aggregate query and so on.

I. Preface

I have come into contact with all kinds of monitoring, such as open source CAT, Zipkin, Pinpoint, etc., and have been deeply secondary developed; I have also contacted the paid listening cloud system APM, and have a good understanding of the highlights and limitations of all kinds of monitoring.

In October last year, we quickly launched an easy-to-use, flexible and bright business monitoring platform, which used Prometheus. From the technology selection stage, we are impressed by Prometheus and its ecology. Today we will talk about monitoring design and Prometheus.

Usually a monitoring system mainly includes collection (information source: log, metrics), reporting (protocol: http, tcp), aggregation, storage, visualization, alarm and so on. Among them, collection and reporting is mainly the core function of the client, generally there are regular peripheral detection (early Nagios, Zabbix), AOP manual weaving into the code (burying point), bytecode automatic weaving (no burying point).

2. What is monitoring

A set of product-oriented service systems or solutions used to quantify management technology or business.

This set of products mainly solves two problems (product value):

Technology: various functions, states and other technical performance of the system are digitized and visualized to ensure the stability and security of the technical system.

Business: all kinds of business performance will be digitized and visualized for analysis and timely intervention to ensure efficient business development.

III. The basic principles of monitoring

Monitoring in advance: monitoring must be considered in the architectural design phase, rather than waiting until the deployment is online.

What to monitor: a global perspective, from the top (business) down. For general business, it is recommended to monitor the place closest to the user first. The good experience of the user is the driving force to promote business development, which is also the most sensitive and important place.

User-friendly: monitoring services are easy to use, easy to access, and as automated as possible

Information sources for technicians and business personnel, able to assist in fault location and resolution

Visualization: clearly display all kinds of data (all kinds of charts), as well as alarm and other information records

Alarm:

What questions need to be notified? (e. G. those requiring human intervention, meaningful)

Call who? (for example, the person in charge of the first-line system)

How to inform? (for example, SMS, telephone, other communication tools; the information is clear, accurate and operable)

How often is it notified? (e. G. 5 minutes)

When to stop notification and when to upgrade to someone else? (for example, it has returned to normal; the problem has not been recovered for two hours, and the upgrade will be notified to the person in charge at a higher level)

IV. Analysis of Prometheus design

Prometheu focuses on all kinds of data that is happening today, rather than tracking data from a few weeks ago, because they believe that "most monitoring queries, alarms, and so on are data within a day", as the Facebook paper confirms: 85% of time series queries are within 26 hours.

In a nutshell, Prometheus is a quasi-real-time monitoring system with its own timing data capability.

1. Overall architecture

Prometheus architecture diagram (quoted from Prometheus's official website)

The architecture of the simplification point is as follows:

Prometheus mainly obtains the time series data leaked from the monitored program (target\ exports) through pull. Of course, pushgateway services are also provided, and generally a small amount of data can also be sent in push mode.

two。 Target discovery

Prometheus obtains metrics data for services through pull, so how does it discover these services?

The discovery of target resources can be handled in a number of ways:

2.1 list of manual profiles

Manually add a static configuration to specify the services to be monitored, as shown in the target block:

Prometheus.yml

Scrape_configs:. # Monitoring activities-job_name: 'xxxxxxactivity-wap' metrics_path: / prometheus/metrics static_configs:-targets: [' 10.xx.xx.xxpur808080,. .] # Monitoring coupon-job_name: 'xxxxxxshop-coupon' metrics_path: / prometheus/metrics static_configs:-targets: [' 10.xx.xx.xxpur8080,. .] # Marketing-job_name: 'xxxxxx-sales-api' metrics_path: / prometheus/metrics static_configs:-targets: [' 10.xx.xx.xxpur808080,. . ].

Obviously, although this approach is very simple, but maintaining a long list of service hosts in busy work is not a scalable and elegant way, dynamic and large-scale will make it impossible to continue.

Specify the load directory, the changes to these directory files will be detected by disk monitoring, and then Prometheus will apply these changes immediately. As an alternative, the contents of the file will also be reread periodically by Prometheus at a specified refresh interval (refresh_interval) and take effect when changes are found.

Examples are as follows:

Prometheus.yml

. # Monitoring order Center OMS-APIscrape_configs:-job_name: 'oms-api' metrics_path: / prometheus/metrics file_sd_configs:-files: -' conf/oms-targets.json' # default to 5-minute refresh_interval:5m.

Conf/oms-targets.json file (changes to this file will be listened for, usually generated by another program, such as a CMDB source):

Oms-targets.json

[{"labels": {"job": "oms-api"}, "targets": ['ip1:8080','ip2:8080',. " ]}]

2.3 automatic discovery based on API

Currently available native service discovery plug-ins are AmazonEC2, Azure, Consul, Kubernetes, and so on.

Take Consul as an example. When the instance starts successfully, you can register the current node information on the Consul by script (or other) (similar to writing the current node information to zk or redis after startup). Prometheus senses changes in Consul data in real time and automatically does thermal loading.

Prometheus.yml

# Monitoring order center OMS-API- job_name: 'oms-api' consul_sd_configs: # consul address, listening for all service address information by default-server:' xxxxxx' services: []

Note: Consul is an open source tool based on GO language, which mainly provides the functions of service registration, service discovery and configuration management for distributed, service-oriented systems. Consul provides service registration / discovery, health check, Key/Value storage, multiple data centers and distributed consistency assurance

2.4 automatic discovery based on DNS

In cases where none of the previous approaches are appropriate, DNS service discovery allows you to specify a list of DNS entries and then query the records in those entries to discover and get the target list. It is used less, so I won't repeat it.

After the monitored target is successfully found, you can visually view it on the self-contained web page, as shown in the figure (local simulation environment):

3. Index collection and aggregation

Prometheus pulls time series data indicators (Exporter) in external processes through pull, and the details of the pull process allow users to configure relevant information, such as frequency, advance aggregation rules, target process disclosure mode (http url), how to connect, connection authentication, and so on.

Index

The so-called index is the quantitative measurement of multiple attributes of software or hardware. Unlike the ELK monitoring of log collection, Prometheus is accomplished through four types of metrics:

(1) Gauge: the number that can be increased or decreased (essentially a snapshot of the measurement). Common ones such as memory usage.

(2) count type (counter): only increase, not decrease, unless reset to 0. For example, the number of HTTP requests for a system.

(3) histogram: the type of data distribution frequency is shown by sampling the monitoring index points.

The above figure emphasizes the importance of distribution in understanding indicators such as delay. If we assume that the SLO (service level target) of this metric is 150ms, then the average delay of 137ms seems acceptable; in fact, 1 in 10 requests is completed above 193ms, and 10 out of every 100 requests are not met! (figure: 90 lines and 99 lines are not up to standard)

(4) Summary (summary): very similar to Histogram, the main difference is that summary completes aggregation on the client side, while Histogram does it on the server side. Therefore, summary is only suitable for monomer indicators that do not need centralized polymerization (such as GC related indicators).

Three rules of thumb:

If you need data aggregation and summary of multiple collection nodes, select the histogram.

If you need to observe the data distribution of multiple collection nodes, select the histogram

If you don't need to consider clustering (such as GC-related information), you can choose summary, which provides a more accurate quantile.

4. Aggregation, query

Built-in data query DSL language: PromQL, it can quickly support aggregation and various forms of query, and through its own web interface, it can be quickly used in browsers. In our practice, using Grafana to do visualization is more practical and beautiful.

For more syntax use of PromQL, you can check the official website documentation without going into detail.

On the aggregation of indicators

For aggregation of metrics, Prometheus provides a variety of functions. Take the following aggregation metrics as an example:

Average number

Median number

Percentile (as shown in the 99 line below: 99% of requests are less than 12s)

Standard deviation (measure the difference in the dataset. 0 means the same as the average, and the larger the difference, the greater the difference.)

Rate of change

5. Data model

Prometheus, like other mainstream temporal databases, includes metric name, one or more labels (the same as tags in InfluxDB) and metric value in the definition of the data model.

For example, JSON is used to represent the original timing data in a time series database:

An example of time series data represented by json

# # using JSON to represent a time series data {"timestamp": 1346846400, / / timestamp "metric": "total_website_visits", / / indicator name "tags": {/ / tag group "instance": "aaa", "job": "job001"}, "value": 18 / / indicator value}

Metric name defines a time series (that is, a timeline) with a set of labels as a unique identity. Once the label changes, a new time series will be created, and the original configuration based on this time series will be invalid. When querying, it supports finding time series according to labels conditions, simple conditions as well as complex conditions.

The image above is a simple view of the distribution of all the data points, the horizontal axis is the time, the vertical axis is the timeline, and each point in the area is the data point. Each time Prometheus receives data, it receives a vertical line in the area in the figure. This expression is very vivid, because at a time, each timeline will only produce one data point, but at the same time, multiple timeline will produce data, connecting these data points together is a vertical line. This feature is important and affects the optimization strategy for data writing and compression.

Retention time

Prometheus is designed to focus on short-term monitoring and alarm, so by default it only saves 15 days of time series data. If you want to be longer-term, it is recommended that you consider storing the data separately on other platforms. At present, our solution is remote storage, and the data pulled by Prometheus will fall on InfluxDB, which ensures better storage flexibility and real-time landing storage of data.

6.Prometheus open source ecology

The Prometheus ecosystem includes AlertManager that provides alarm engine and alarm management, PushGateWay that supports data reporting in push mode, Grafana that provides a more elegant and beautiful visual interface, Mtail that supports the conversion of remote storage RemoteStoreAdapter;log to metric, and so on.

In addition, there is a series of Exporter (which can be understood as monitoring agent), which can be installed and used directly. Automatically monitor applications, machines, mainstream databases, MQ, and so on.

There is also a series of client libraries in the Prometheus ecosystem that support a variety of mainstream programming languages Java, C, Python, and so on.

At this point, the study on "how to configure the list of Prometheus files" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.