Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand Prometheus

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to understand Prometheus". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand Prometheus.

Daily monitoring

Suppose you need to monitor the request volume of each API in WebServerA as an example. The dimensions to be monitored include: service name (job), instance IP (instance), API name (handler), method (method), error code (code), and request volume (value).

If you take SQL as an example, demonstrate common query operations:

Query the number of requests for method=put and code=200 (red box)

SELECT * from http_requests_total WHERE code= "200" AND method= "put" AND created_at BETWEEN 1495435700 AND 1495435710

Query the number of requests for handler=prometheus and method=post (green box)

SELECT * from http_requests_total WHERE handler= "prometheus" AND method= "post" AND created_at BETWEEN 1495435700 AND 1495435710

The number of requests that query instance=10.59.8.110 and handler starts with query (green box)

SELECT * from http_requests_total WHERE handler= "query" AND instance= "10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710

As can be seen from the above examples, in terms of common queries and statistics, daily monitoring is mostly used to combine queries and time according to the dimensions of the monitoring. If 100 services are monitored, an average of 10 instances are deployed for each service, and each service has 20 API,4 methods. Data is collected once in 30 seconds and retained for 60 days. Then the total number of data items is: 100 (service) 10 (instance) 20 (API) 4 (method) 86400 (seconds in 1 day) * 60 (days) / 30 (seconds) = 138.24 billion pieces of data. It is impossible to write, store and query data of this magnitude on a relational database of Mysql class. So Prometheus uses TSDB as the storage engine

Storage engine

As the storage engine of Prometheus, TSDB fits perfectly with the application scenario of monitoring data.

The amount of data stored is very large.

Write operations most of the time

Write operations are added almost sequentially, and most of the time the data is sorted by time when it arrives.

Write operations rarely write to data from a long time ago, and rarely update data. In most cases, the data is written to the database a few seconds or minutes after the data is collected.

The delete operation is typically a block delete, selecting the starting historical time and specifying subsequent blocks. It is rare to delete data at a certain time or separate random time separately.

The basic data is large and generally exceeds the memory size. Generally, only a small part of it is selected and there is no regularity, and the cache has almost no effect.

A read operation is a very typical ascending or descending order of reading

Highly concurrent read operations are very common

So how does TSDB achieve the above functions?

"labels": [{"latency": "500"}] "samples": [{"timestamp": 1473305798, "value": 0.9}]

The original data is divided into two parts: label and samples. The former records the dimension of monitoring (label: tag value), the metric name and the optional key-value pair of the label uniquely determine a time series (represented by series_id), while the latter contains a timestamp (timestamp) and indicator value (value).

Series ^ │. . . . . . . . . . . . Server {latency= "500"} │. . . . . . . . . . . . Server {latency= "300"} │. . . . . . . . . . . Server {} │. . . . . . . . . . . . V

TSDB uses timeseries:doc:: to store value for key. To speed up common query operations: the combination of label and time range. TSDB builds three additional indexes: Series, Label Index, and Time Index.

Take the tag latency as an example:

Series

Store two parts of data. One part is the sequence of all label key-value pairs arranged in dictionary order (series); the other part is the index from the timeline to the data file, which cuts and stores the specific location information of the data block records according to the time window, so you can quickly skip a large number of non-query window record data when querying.

Label Index

Each pair of label stores a list of all the values of the tag with index:label: as key, and points to the starting position of that value by reference to Series.

Time Index

The data will use index:timeseries:: as key and point to the data file for the corresponding time period.

Data calculation

The powerful storage engine provides perfect help for data computing, which makes Prometheus completely different from other monitoring services. Prometheus can query out different data sequences, and then add the basic operators and powerful functions to perform the matrix operation of metric series (see figure below).

In this way, the capability of the Promtheus system is not weaker than the "data warehouse" + "computing platform" of the monitoring community. Therefore, at the beginning of big data's application in the industry, we can understand that this is the direction of monitoring in the future.

Make a calculation and inquire everywhere

Of course, such a strong computing power, the consumption of resources is also quite scary. Therefore, the query pre-calculation result is usually much faster than each time the original expression is required, especially in the applicable scenarios of dashboards and alarm rules, where the same expression needs to be queried repeatedly every time the dashboard is refreshed, and the same is true for each operation of alarm rules. Therefore, Prometheus provides Recoding rules, which can pre-calculate expressions that are often needed or have a large amount of computation, and save the results as a new set of time series to achieve the purpose of one calculation and multiple queries.

At this point, I believe you have a deeper understanding of "how to understand Prometheus", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report