Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Prometheus understand

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

The content of this article mainly focuses on how to understand Prometheus. The content of the article is clear and well-organized. It is very suitable for beginners to learn and is worth reading. Interested friends can follow the editor to read together. I hope you can get something through this article!

Prometheus powers your metrics and alerts with leading open source monitoring solutions.

1 Overview

1.1. What is Prometheus?

Prometheus is an open source toolkit for system monitoring and alerting. Since its launch in 2012, many companies and organizations have adopted Prometheus, and the project has a very active community of developers and users. It is now an independent open source project that is maintained independently of any company. Prometheus joined the Cloud Native Computing Foundation in 2016, becoming the second managed project after Kubernetes.

1.1.1. The main features of Prometheus are:

A multidimensional data model containing time series data identified by indicator names and key / value pairs (Tag)

PromQL is a flexible query voice, which is used to query and utilize these dimensional data without relying on distributed storage. A single server node is autonomous.

Time series collection is done through the pull model on HTTP (Pull is supported)

Push time series is supported through an intermediate gateway (Push is also supported)

The goal is discovered through service discovery or static configuration

Multiple modes of graphics and dashboard support

To sum up, it is multi-dimensional data model, PromQL query language, node autonomy, HTTP active pull or gateway active push to obtain time series data, automatic target discovery, and multiple dashboard support.

1.1.2. Components:

Prometheus server, which is responsible for fetching and storing time series data, is the most important component

Client libraries, a client library for detecting application code

Push gateway to support short-term jobs

Exporters, used to support third parties such as HAProxy

Alertmanager, used to handle alarm

Various support tools

Most Prometheus components are written in Go, which makes them easy to build and deploy as static binaries

1.1.3. Architecture:

This diagram shows some of the components of the architecture and its ecosystem:

Prometheus obtains metrics from instrumented jobs, either directly or through mediation push gateways to obtain short-term jobs. It stores all captured samples locally and applies rules to the data to aggregate the data and record new time series or generate alerts. You can use Grafana or other API to visualize the collected data.

1.2. When is the right time to use it

Prometheus can well record any pure digital time series. It is suitable for both machine-centric monitoring and highly dynamic service-oriented architecture monitoring. In the world of microservices, its support for multidimensional data collection and query is a special advantage.

Prometheus is designed for reliability, and you can quickly diagnose problems when your service goes down. Each Prometheus server is independent and does not rely on network storage or other remote services.

1.3. When it is not appropriate to use it

The reliability of the value of Prometheus. You can always view statistics about the system, even in the event of a failure. If you need 100% accuracy, such as billing on request, Prometheus is not a good choice because the data collected may not be detailed and complete. In this case, it is best to use other systems to collect and analyze the data used for billing, and use Prometheus to do the rest of the monitoring.

1.4. Prometheus VS InfluxDB

InfluxDB is an open source time series database with commercial options for extension and clustering. The InfluxDB project was released nearly a year after Prometheus development began, so it was not possible to consider it as an alternative at the time. Nevertheless, there are still significant differences between Prometheus and fluxdb. There are many similarities between the two. Both have tags (called tags in InfluxDB) to effectively support multidimensional metrics. They basically use the same data compression algorithm. Both of them have extensive integration, including integration with each other. Both have hooks that allow them to be further extended, such as analyzing data in statistical tools or performing automated operations.

InfluxDB is better in the following situations:

If you are doing event logging

The business option provides clustering for InfluxDB, which is also better for long-term data storage

Finally, the consistency of data between copies is realized.

Prometheus is better in the following situations:

If your main job is to measure

If you need more powerful query languages, alerts, and notifications

Higher availability and uptime for drawing and alarm

InfluxDB is maintained by a commercial company that follows the open core model and provides advanced features such as closed-source clustering, hosting, and support.

Prometheus is a completely open source and stand-alone project maintained by many companies and individuals, some of which provide commercial services and support.

two。 Basic concept

2.1. Data model

Prometheus basically stores all data as time series: a stream of timestamp values that belong to the same metric and the same set of marked dimensions. In addition to storing time series, Prometheus can also generate temporary derived time series based on query results.

(PS: the interpretation of time series here is like this

Time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions

)

2.1.1. Metric names and labels

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

(each time series is uniquely identified by its metric name and an optional key-value pair called a label)

The metric name specifies the general characteristics of the system to be measured (for example, http_requests_total represents the total number of HTTP requests received). It may contain ASCII letters and numbers, as well as underscores and colons. It must match the regular expression [a-zA-Z0-9:] *

Label names can contain ASCII letters, numbers, and underscores. They must match the regular expression [amurzAmurZZ] [a-zA-Z0-9] *. Tag names that begin with _ _ are reserved for internal use.

The tag value can contain any Unicode character.

2.1.2. Sample (sample)

The samples constitute the actual time series data. Each sample includes:

A float64 valuea millisecond-precision timestamp

2.1.3. Notation (notation)

Given a measurement name and a set of labels, time series are usually identified by the following symbols:

{=,.}

For example, if there is a time series whose name is api_http_requests_total and has two labels method= "POST" and handler= "/ messages", the time series can be written as follows:

Api_http_requests_total {method= "POST", handler= "/ messages"}

2.2. Metric types (indicator type)

2.2.1. Counter (counter)

A counter is a cumulative indicator that represents a monotonously incrementing counter whose value can only be incremented or reset to zero on restart. For example, you can use counters to indicate the number of requests served, tasks completed, or errors. Do not use counters to reflect a value that may decrease. For example, do not use counters to indicate the number of processes currently running, in which case you should use gauge.

2.2.2. Gauge (meter)

The meter represents a value that can be moved up and down at will.

Meters are usually used to measure temperature or current memory usage, etc., as well as for "counting", such as the number of concurrent requests.

2.2.3. Histogram (histogram, histogram)

The histogram samples the observations (usually something like request duration or response size) and counts them in a configurable bucket. It also provides the sum of all observations.

The histogram exposes multiple time series during a crawl with a basic metric name:

The cumulative counter of the observation bucket, which is the sum of all observations in the format _ bucket {le= ""}, the count of events observed by _ sum, and the format _ count

2.2.4. Summary (Summary)

Similar to bar charts, sample observations are summarized (usually things such as request duration and response size). Although it also provides the total number of observations and the sum of all observations, it calculates the configurable quantiles on a sliding time window.

2.3. Jobs AND Instances (jobs and instances)

In Prometheus terminology, an endpoint that can be crawled is called an instance, which usually corresponds to a single process. A collection of instances with the same purpose is called a job.

For example, an API Server job has four copies of instances:

Job: api-server

Instance 1: 1.2.3.4:5670instance 2: 1.2.3.4:5671instance 3: 5.6.7.8:5670instance 4: 5.6.7.8:5671

2.3.1. Automatically generate labels and time series

When Prometheus grabs a target, it automatically appends some tags to the captured time series to identify the captured target:

Job: the configured job name instance:: to which the target belongs is part of the crawled target URL 3. Quick start Prometheus is an open source system monitoring and alerting toolkit with an active ecosystem.

3.1. Download and install

Prometheus is a monitoring platform that collects metrics of monitored targets by grabbing HTTP endpoints on these targets.

You need to download, install, and run Prometheus. You also need to download and install an exporter, which is a tool for exporting time series data on hosts and services.

Https://prometheus.io/download/

Before running Prometheus, let's configure

3.1.1. Configure Prometheus to monitor itself

Prometheus collects data from the monitored target by grabbing HTTP endpoint data on the target. Because Prometheus exposes its own data in the same way, it can also capture and monitor its own health.

Although the Prometheus server is not very useful in practice to collect only data about itself, it is a good starting example. Save the following basic Prometheus configuration as a file named Prometheus.yml:

1 global:

2 scrape_interval: 15s # By default, scrape targets every 15 seconds. 3 4 # Attach these labels to any time series or alerts when communicating with 5 # external systems (federation, remote storage, Alertmanager). 6 external_labels: 7 monitor: 'codelab-monitor' 8 9 # A scrape configuration containing exactly one endpoint to scrape:10 # Here it's Prometheus itself.11 scrape_configs:12 # The job name is added as a label `job= `to any timeseries scraped from this config.13-job_name:' prometheus'14 15 # Override the global default and scrape targets from this job every 5 seconds.16 scrape_interval: 5s17 18 static_configs:19-targets: ['localhost:9090']

3.1.2. Start Prometheus

1 # Start Prometheus.2 # By default, Prometheus stores its database in. / data (flag-storage.tsdb.path). 3. / prometheus-config.file=prometheus.yml

3.2. Configuration

Prometheus can be configured from the command line and configuration files. The configuration file defines everything related to the crawl job and its instances, as well as which rule files to load.

Run. / prometheus-h to view all supported commands

To specify which configuration file to load, use the-- config option

The configuration file is in YAML format

There are too many configuration items. Do not enumerate them one by one. Check them yourself.

Https://prometheus.io/docs/prometheus/latest/configuration/configuration/

Global:

# How frequently to scrape targets by default. [scrape_interval: | default = 1m] # How long until a scrape request times out. [scrape_timeout: | default = 10s] # How frequently to evaluate rules. [evaluation_interval: | default = 1m] # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). External_labels: [:...] # Rule files specifies a list of globs. Rules and alerts are read from # all matching files. Rule_files: [-...] # A list of scrape configurations. Scrape_configs: [-...] # Alerting specifies settings related to the Alertmanager. Alerting: alert_relabel_configs: [-...] Alertmanagers: [-...] # Settings related to the remote write feature. Remote_write: [-...] # Settings related to the remote read feature. Remote_read: [-...]

Here is a valid sample configuration file

3.3. Query

Prometheus provides a functional query language called PromQL (Prometheus query language), which allows users to select and aggregate time series data in real time. The results of expressions can be displayed as graphics, viewed as tabular data in Prometheus's expression browser, or used by external systems through HTTP API.

3.3.1. Expression data type

In Prometheus's expression language, expressions or subexpressions can be evaluated as one of the following four types:

Instant vector (instantaneous vector): a set of time series, each time series containing a sample, all samples share the same timestamp

Range vector (range vector): a set of time series containing the range of data points that each time series varies with time.

Scalar (scalar): a simple numerical floating point value

String (string): a simple string value that is not currently used

3.3.2. Literal value

String literal

Strings can be specified as text in single quotation marks, double quotation marks, or backquotes. For example:

1 "this is a string" 2 'these are unescaped:\ n\ tcm 3 `these are not unescaped:\ n' "\ t`

Floating point face value

For example:-2.34

3.3.3. Time series selector

Instantaneous vector selector

The instantaneous vector selector allows you to select a set of time series and a sample value for each time series at a given timestamp (instantaneous): in the simplest form, only one metric name is specified. Such a vector will contain all the time series elements of the metric name.

In the following example, select all time series whose metric name is http_requests_total:

Http_requests_total

You can further filter these time series by adding a set of matching tags to the curly braces ({}).

In the following example, select a time series with the metric name http_requests_total, the job tag value prometheus, and the group tag value canary:

Http_requests_total {job= "prometheus", group= "canary"}

Label matching operator:

=: select a label that is exactly the same as the supplied string (equal to)! =: select a label that is not equal to the supplied string (not equal to) = ~: regular match! ~: irregular match

The following example selects all staging, testing, development environments, and the HTTP request method is not GET's http_requests_total time series

Http_requests_total {environment=~ "staging | testing | development", methodological = "GET"} do not match empty tags

{job=~ ". +"} # Good! {job=~ ". *", method= "get"} # Good!

3.3.4. Range vector selector

Range vector literals work like instantaneous vector literals, except that they select a sample range from the current instantaneous quantity. Syntactically, the range duration is added to the square brackets ([]) at the end of the vector selector to specify how many time values should be taken for each result range vector element.

The time period is specified as a number, followed by one of the following units: s (seconds), m (minutes), h (hours), d (days), w (weeks), y (years)

In the following example, the selection indicator name is http_requests_total and the job tag value is the last 5-minute time series that has been recorded by prometheus:

Http_requests_total {job= "prometheus"} [5m] Offset modifier

The following expression returns the value of http_requests_total over the past 5 minutes relative to the current query evaluation time:

Http_requests_total offset 5m note that offset always follows the selector

The following example of sum (http_requests_total {method= "GET"} offset 5m) returns the time series of the last 5 minutes of http_requests_total a week ago

Rate (http_requests_ Total [5m] offset 1w)

3.3.5. Subquery

Syntax:'['':'[]]'[offset]

3.3.5. Operator

Prometheus's query language supports basic logical and arithmetic operators.

Arithmetic binary operator

+ (plus),-(minus), * (multiply), / (divide),% (remainder), ^ (index)

Binary arithmetic operators are defined between scalar / scalar, vector / scalar and vector / vector value pairs

Compare binary operators

=,!, >,

< 、>

=,

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report