Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the rules for using Prometheus

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the rules for using Prometheus". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the rules for using Prometheus".

In the configuration of system monitoring, even if the monitoring is not comprehensive enough, or do not know how to get the desired indicators.

Awesome Prometheus alerts maintains a set of out-of-the-box Prometheus alarm rules, with more than 300 alarm rules. At the same time, it also explains how to obtain the corresponding indicators. These rules are common to every Prometheus.

It involves basic resources such as hosts, hardware, containers, to databases, message agents, runtimes, reverse proxies, responsible equalizers, runtimes, service orchestrations, and even network level and Prometheus itself and clusters. There is no need to elaborate on the installation and configuration of Prometheus, which can be seen here. Let's take a look at a few common rules

Host and hardware resources

The alarm of host and hardware resources depends on the indicator of node-exporter output. For example:

Insufficient memory

An alarm is triggered when the available memory is below the threshold of 10%.

-alert: HostOutOfMemory expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

< 10 for: 2m labels: severity: warning annotations: summary: Host out of memory (instance {{ $labels.instance }}) description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"主机异常的网络吞吐 最近两分钟入站的流量超过 100m。 rate 语法见这里。 - alert: HostUnusualNetworkThroughputIn expr: sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 >

100 for: 5m labels: severity: warning annotations: summary: Host unusual network throughput in (instance {{$labels.instance}}) description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\ nVALUE = {{$value}}\ nLABELS = {{$labels}}" Mysql

The alarm of Mysql depends on the metrics output of prometheus/mysqld_exporter.

Too many connections

The number of connections of the Mysql instance in the last minute exceeds 80% of the maximum value to trigger an alarm.

-alert: MysqlTooManyConnections (> 80%) expr: avg by (instance) (rate (mysql_global_status_threads_ connected [1m]) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80 for: 2m labels: severity: warning annotations: summary: MySQL too many connections (> 80%) (instance {{labels.instance}}) description: "More than 80% of MySQL connections are in use on {{ $labels.instance}}\ nVALUE = {{$value}\ nLABELS = {{$labels}} "slow query

Triggered when the number of slow queries in the last minute is greater than 0.

-alert: MysqlSlowQueries expr: increase (mysql_global_status_slow_ query [1m]) > 0 for: 2m labels: severity: warning annotations: summary: MySQL slow queries (instance {{labels.instance}}) description: "MySQL server mysql has some new slow query.\ n VALUE = {{$value}}\ nLABELS = {{$labels}}" Runtime JVM

There is only a pathetic one for the runtime alarm of JVM. More than 80% of heap space is used to trigger an alarm.

Metrics that depend on java-client output.

-alert: JvmMemoryFillingUp expr: (sum by (instance) (jvm_memory_used_bytes {area= "heap"}) / sum by (instance) (jvm_memory_max_bytes {area= "heap"})) * 100 > 80 for: 2m labels: severity: warning annotations: summary: JVM memory filling up (instance {{labels.instance}}) description: "JVM memory is filling up (> 80%)\ n VALUE = {{$value}\ nLABELS = {{$labels}" Kubernetes

There are 33 alarm rules related to Kubernetes, which are relatively rich.

Pick a more common one: container OOM alarm.

-alert: KubernetesContainerOomKiller expr: (kube_pod_container_status_restarts_total-kube_pod_container_status_restarts_total offset 10m > = 1) and ignoring (reason) min_over_time (kube_pod_container_status_last_terminated_reason {reason= "OOMKilled"} [10m]) = = 1 for: 0m labels: severity: warning annotations: summary: Kubernetes container oom killer (instance {{$labels.instance}}) description: "Container {{$labels. Container}} in pod {{$labels.namespace}} / {{$labels.pod}} has been OOMKilled {{$value}} times in the last 10 minutes.\ nVALUE = {{$value}}\ nLABELS = {{$labels}} "SSL certificate expired

The output metrics can be used to monitor certificate expiration: an alarm will be triggered if a certificate expires in the next 7 days.

-alert: SslCertificateExpiry (

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report