How to gracefully use cloud native Prometheus to monitor clusters 07/11 Update SLTechnology News&Howtos

How to gracefully use cloud native Prometheus to monitor clusters

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to gracefully use cloud native Prometheus to monitor the cluster, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Overview

Prometheus is an open source system monitoring and alarm framework. In 2016, Prometheus officially joined Cloud Native Computing Foundation and became a project second only to Kubernetes in popularity.

Instance management creates an instance

Click the * * New * * button on * * Top * * to enter * * create Monitoring instance * *. As shown in the following figure:

In "create Monitoring instance", set the cluster information as prompted. As shown in the following figure:

Region: select the region in which you want to deploy the instance, and modify the region method after the instance is created. It is recommended that you select a region close to the business according to your geographical location, which can reduce access latency and improve the speed of data reporting.

Routing: you need to select the existing VPC and subnet in the current region, which cannot be modified after creation. If there are no vpc resources in this region, you can jump to the private network console to create a new vpc. By default, an instance can only monitor the clusters of this vpc network. If you want to monitor other vpc clusters, you need to connect services such as "Cloud Union" to "vpc".

Data storage time: select a data storage time of 15 days / 3 cycles / 6 years / year. After the instance is created successfully, the object storage COS bucket will be created for you and billed according to the actual resources. For more information, please see Overview of object Storage Billing.

Grafana component: here you need to set the login user name and password for Grafana login. Grafana only supports access within vpc by default. After the instance is created, you can activate Grafana public network access according to your business needs.

AlertManger: you can send the alarm of instance production to the self-built AlertManger by adding the AlertManger address defined by AlertManger.

Basic information

After the instance is created, it will be running. You can click on the instance to view its basic information. In addition to the information specified at the time of creation, it also contains some information provided after the creation is completed:

Object buckets: cloud original Monitoring enables objects to be stored in persistent data, and the instance will create "buckets for object storage" under your account to store data.

Prometheus data query address: this connection provides data query, targets query, rules query and so on. You can dock the Grafana built by "this address"

Grafana: internal address is provided for you by default. You can optionally enable external access. When enabled, it will be converted into a fixed external domain name.

Prometheus data query interface

The Prometheus data query address currently supports the following path.

* * / api/v1/query:** queries the data crawled last time.

* * / api/v1/query_range:** queries data over a period of time.

* * / api/v1/targets:** queries the information of the monitoring target.

Since the instance may be associated with multiple clusters, you need to add the parameter cluster= "Cluster Type"-"Cluster id" to specify the target cluster:

TKE Cluster: / api/v1/targets?cluster=tke-cls-xxx

Elastic Cluster: / api/v1/targets?cluster=eks-cls-xxx

Edge Cluster: / api/v1/targets?cluster=tkeedge-cls-xxx

* * / api/v1/alerts:** queries alarm status.

* * / api/v1/rules:** query aggregation and alarm rules.

Default monitoring panel

The Grafana created by the instance provides some commonly used monitoring panels, including cluster comprehensive information, nodes, workload, Pod, and so on. When you associate the cluster later, the data of the associated cluster can be viewed in the default monitoring panel.

Multi-cluster management associated cluster

After the instance is created, you need to associate with the cluster that you need to monitor. After the association, you can collect and configure it in the cluster by creating SerivceMonitor,PodMonitor and so on.

The standard cluster and elastic cluster of TKE need to be the same VPC network as the instance Prometheus, while the edge cluster is not subject to this restriction.

Collection and configuration Prometheus Operator

Cloud native monitoring provides users with Prometheus Operator-compatible capabilities. You can modify the collection-related CRD defined by Prometheus Operator, such as Prometheus CRD,ServiceMonitor,PodMonitor. For more information, please see [prometheus operator]

Modify acquisition cycle and additional labels

When a cluster is successfully associated, a Prometheus CRD resource named [tke-cls-xxx] is created under the prom-xxx namespace of the associated cluster. You can modify the global collection configuration, additional labels, and so on by modifying this resource.

ServiceMonitor

Cloud original "Monitor" allows you to create [ServiceMonitor]. You can create a ServiceMonitor through the console or directly in the cluster console. Select any service in the cluster to automatically aggregate into Yaml.

PodMonitor

Cloud original Monitor supports the creation of [PodMonitor], which can be created through the console or directly in the cluster PodMonitor.

Rawjob

Cloud original "monitoring" allows you to create the job of the original Prometheus directly. You can create it through the console or modify the promtheus-config secret under the prom-xxx namespace in the cluster to achieve the same effect.

Final configuration

You can see the final Prometheus configuration in the upper right of the cluster data collection configuration in the console. You can also view the final configuration by viewing the secret of the prometheus-tke-cls-xxx under the prom-xxx namespace of the associated cluster.

Mount the piece to the collector

When configuring collection items, you may need to provide "some" items for the configuration, such as certificates. You can mount the collection items to the collector by using the following methods, and the updates of the collection items will be synchronized to the collector in real time:

Configmap under the prom-xxx namespace is typed with the following label:prometheus.tke.tencent.cloud.com/scrape-mount = "true".

All the key will be mounted to the collector's path / etc/prometheus/configmaps/ [configmap-name] /.

Type secret under the prom-xxx namespace with the following label:prometheus.tke.tencent.cloud.com/scrape-mount = "true"

All the key will be mounted to the collector's path / etc/prometheus/secret/ [secret-name] /

Default collection configuration

After the cluster is associated, Yunyuan monitoring will create a default collection configuration in the associated cluster. The following two components will be installed

Kube-state-metrics: the [kube-state-metrics] component will be installed under kube-system.

Node exporter: the [node exporter] component will be installed under kube-system.

The following collection items will be added:

The following aggregation rules are added:

View targets

You can view the status of all currently monitored objects through the console [View targets logo].

An introduction to aggregation rules

The aggregation rule is used to make the "PromQL" into a new indicator, and the aggregation rule will be calculated in a period of 30 seconds.

Create aggregation rules

Alarm brief introduction

Alarm rules are used to define alarms, and alarm rules will be calculated in a period of 30 seconds.

Create alarm rules

You can create alarm rules through the console. Multiple rules can be configured for each alarm:

Name: define the alarm name for the "household"

Rule name: identify the rule, which can be used to view its alarm status in Grafana

PromQL: rule statement

Labels: extra when sending alarms. This "Labels" method is quoted in the alarm content.

Alarm content: as a template for alarm content, you can use alarm {{$value}} to trigger the value of alarm, or allow {{$labels.label-name}} to quote the value of a label.

Duration: the alarm will only be triggered when the rule is in the alarm state and the time is below this value. When configuring the alarm rule, you also need to configure the corresponding alarm channel information.

View alarm history

You can query the alarm history in the alarm history boundary. By default, you can query the alarm records at nearly 24 o'clock. You can query the records within the alarm range through time filtering, and you can enter fuzzy filtering through the search bar on the right.

Create aggregation or alarm rules within the cluster

You can create alarms and aggregation rules by creating [PrometheusRule] directly within the cluster

By default, the externalLables in the Prometheus CRD in the PromQL of the rule will be activated, and the rule will only be valid for this cluster from slave. You can close the externalLabels note by typing the following annottation in [PrometheusRule]: prometheus.tke.tencent.cloud.com/disable-labels-inject= ""

If you create an alarm rule, you can specify the alarm channel by adding the following annotation: prometheus.tke.tencent.cloud.com/notification-inject= "Channel id", where the channel id is the alarm id created when the alarm rule is created in the console. Therefore, if you want to make "intra-cluster alarm rules", you need to create fewer alarms in the console.

Brief introduction of template Featur

Templates are divided into two types: aggregation rules, alarm policy templates and data collection templates. Used to manage multi-cluster prometheus configurations and support functions such as one-click synchronous upgrades.

Create a template

You can create a template through the console, either copy an existing template or create an empty template, and then add a custom configuration.

The aggregation rules are consistent with the alarm policy template, data acquisition and other configuration methods.

Name: template name.

Type: the default template only allows replication, not editing and deletion.

Number of associated Prometheus instances: the number of Prometheus instances bound to the template.

Number of associated Prometheus Agent: the number of Prometheus instances bound to the template.

Version: represents the version of the current template.

List of associated instances

The list of instances associated with alarms and aggregation templates, in which you can synchronize alarm policies and aggregation rules for multiple bound Prometheus instances.

Associated instance

Prometheus instances of multiple regions or clusters of multiple regions can be associated at the same time, so as to achieve the effect of one-click synchronization and management of multiple clusters.

If you disassociate a template, all template-related configurations are cleared and cannot be restored.

On how to gracefully use the cloud native Prometheus monitoring cluster to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.