How to add WeCom Monitoring alarm to alertmanager alarm 07/08 Update SLTechnology News&Howtos

How to add WeCom Monitoring alarm to alertmanager alarm

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Prometheus machine: 172.27.143.155

Alertmanager machine: 172.27.143.150

The Prometheus alarm is divided into two parts. The alarm rules in the Prometheus server send alerts to Alertmanager.

Alertmanager then processes these alarm messages, including silencing, banning, aggregating and sending to email, PagerDuty and HipChat.

The main steps to set up alarms and notifications are:

Set up and configure Alertmanager

Configure Prometheus and Alertmanager communication

Create alarm rules in Prometheus

Alertmanager handles alerts sent by client applications, such as Prometheus servers.

It is responsible for deduplicating, grouping and routing them to the correct recipient integration, such as email, PagerDuty, or OpsGenie. It also handles the silence and suppression of alarms.

First, it is configured with Prometheus and grafana services on the 155machine

Next, configure the alermanager service

1. Wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz

2 、 tar zxf alertmanager-0.20.0.linux-amd64.tar.gz

3 、 mv alertmanager-0.20.0.linux-amd64 / usr/local/alertmanager

4 、 vim alertmanager.yml

5 、 vim / etc/alertmanager/template/wechat.tmpl

= Monitoring alarm =

Alarm status: {{.Status}}

Alarm level: {{$alert.Labels.severity}}

Alarm type: {{$alert.Labels.alertname}}

Alarm App: {{$alert.Annotations.summary}}

Alarm host: {{$alert.Labels.instance}}

Alarm details: {{$alert.Annotations.description}}

Trigger threshold: {{$alert.Annotations.value}}

Alarm time: {{$alert.StartsAt.Format "2006-01-02 15:04:05"}}

= end=

6. Start the service after completion

Nohup. / alertmanager &

Next, configure the Prometheus service

Modify the configuration file

You need to create a rules directory

There are two files inside, one is monitored by the host and the other is monitored by the container.

1) cat host_sys.yml

Groups:

Name: Host

Rules:alert: Memory Usage

Expr: (node_memory_MemTotal_bytes-(node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 2

For: 1m

Labels:

Name: Memory

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "Host memory usage exceeds 80%."

Value: "{{$value}}" alert: CPU Usage

Expr: sum (avg without (cpu) (irate (node_cpu_seconds_total {modestly optimized roomidle`} [5m]) by (instance,appname) > 0.05

For: 1m

Labels:

Name: CPU

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "Host CPU utilization rate exceeds 65%."

Value: "{{$value}}" alert: HostLoad

Expr: node_load5 > 4

For: 1m

Labels:

Name: Load

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "the host load exceeds 4. 5 minutes."

Value: "{{$value}}" alert: Filesystem Usage

Expr: 1-(node_filesystem_free_bytes / node_filesystem_size_bytes) > 0.3

For: 1m

Labels:

Name: Disk

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "Host [{{$labels.mountpoint}}] partition usage exceeds 80%."

Value: "{{$value}}%" alert: Diskio writes

Expr: irate (node_disk_writes_completed_total {job=~ "Host"} [1m]) > 50

For: 1m

Labels:

Name: Diskio

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "the average IO load of 1 minute writes to the host [{{$labels.device}}] disk is high."

Value: "{{$value}} iops" alert: Diskio reads

Expr: irate (node_disk_reads_completed_total {job=~ "Host"} [1m]) > 5

For: 1m

Labels:

Name: Diskio

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "the average IO load of 1 minute read from host [{{$labels.device}}] disk is high."

Value: "{{$value}} iops" alert: Network_receive

Expr: irate (node_network_receive_bytes_total {deviceholders ~ "lo | bond [0-9] | cbr [0-9] | veth. | virbr. | ovs-system"} [5m]) / 1048576 > 5

For: 1m

Labels:

Name: Network_receive

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "the average received traffic of the host [{{$labels.device}}] Nic exceeds 5Mbps in 5 minutes."

Value: "{{$value}} Mbps" alert: Network_transmit

Expr: irate (node_network_transmit_bytes_total {deviceholders ~ "lo | bond [0-9] | cbr [0-9] | veth. | virbr. | ovs-system"} [5m]) / 1048576 > 5

For: 1m

Labels:

Name: Network_transmit

Severity: Warning

Annotations:

Summary: "{{$labels.appname}}"

Description: "the average traffic sent by the host [{{$labels.device}}] Nic exceeds 5Mbps within 5 minutes."

Value: "{{$value}} Mbps"

2) cat container_sys.yml

Groups:

Name: Container

Rules:alert: CPU Usage

Expr: (sum by (name,instance) (rate (container_cpu_usage_seconds_total {imageframes = ""} [5m])) * 100) > 80

For: 1m

Labels:

Name: CPU

Severity: Warning

Annotations:

Summary: "{{$labels.name}}"

Description: "Container CPU usage exceeds 80%"

Value: "{{$value}}%" alert: Memory Usage

Expr: (container_memory_usage_bytes {name=~ ". +"}-container_memory_cache {name=~ ". +"}) / container_spec_memory_limit_bytes {name=~ ". +"} * 100 > 80

For: 1m

Labels:

Name: Memory

Severity: Warning

Annotations:

Summary: "{{$labels.name}}"

Description: "Container memory usage exceeds 80%."

Value: "{{$value}}%" alert: Network_receive

Expr: irate (container_network_receive_bytes_total {name=~ ". +", interface=~ "eth.+"} [5m]) / 1048576 > 5

For: 1m

Labels:

Name: Network_receive

Severity: Warning

Annotations:

Summary: "{{$labels.name}}"

Description: "the container [{{$labels.device}}] Nic receives more traffic than 5Mbps in 5 minutes."

Value: "{{$value}} Mbps" alert: Network_transmit

Expr: irate (container_network_transmit_bytes_total {name=~ ". +", interface=~ "eth.+"} [5m]) / 1048576 > 5

For: 1m

Labels:

Name: Network_transmit

Severity: Warning

Annotations:

Summary: "{{$labels.name}}"

Description: "the container [{{$labels.device}}] Nic sends more traffic than 5Mbps in 5 minutes."

Value: "{{$value}} Mbps"

Restart the Prometheus service after the configuration is complete

Wait a minute to verify the effect.

The recovery is the following

The monitoring container is complete.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.