In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to build Prometheus monitoring alarm and custom email template, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
I. introduction of Prometheus & AlertManager
Prometheus is a set of open source system monitoring, alarm, time series database combination, initially developed by SoundCloud, and then as more and more companies use, so it is an independent open source project. Alertmanager is mainly used to receive alarm messages sent by Prometheus, it supports a wealth of alarm notification channels, such as email, Wechat, nails, Slack and other common communication tools, and it is easy to remove duplicate alarm information, reduce noise, grouping, etc., it is a very easy to use alarm notification system.
II. Basic concepts
Prometheus
Official website (https://prometheus.io/)
Is a set of open source monitoring and alarm system, but also a time series database.
Architecture diagram
Basic principles
The basic principle of Prometheus is to capture the status of the monitored components periodically through the HTTP protocol, and any component can access the monitoring as long as it provides the corresponding HTTP interface. No SDK or other integration process is required. This is very suitable for virtualized environment monitoring systems, such as VM, Docker, Kubernetes, etc. The HTTP interface that outputs the information of the monitored component is called exporter. At present, most of the components commonly used by Internet companies can be used directly by exporter, such as Varnish, Haproxy, Nginx, MySQL, Linux system information (including disk, memory, CPU, network, etc.).
Service process
Prometheus Daemon is responsible for regularly fetching metrics (index) data on the target, and each crawling target needs to expose an interface of the http service to fetch it on a regular basis. Prometheus supports specifying crawling targets through configuration files, text files, Zookeeper, Consul, DNS SRV Lookup, and so on. Prometheus uses PULL to monitor, that is, the server can Push data directly through the target PULL data or indirectly through the intermediate gateway.
2.Prometheus stores all the captured data locally, cleans up and organizes the data through certain rules, and stores the results in a new time series.
3.Prometheus visually presents the collected data through PromQL and other API. Prometheus supports many ways of chart visualization, such as Grafana, native Promdash, self-provided template engine, and so on. Prometheus also provides the query method of HTTP API, customizing the required output.
4.PushGateway supports Client to actively push metrics to PushGateway, while Prometheus only regularly goes to Gateway to grab data.
Alertmanager is a component independent of Prometheus, which can support the query statement of Prometheus and provide a very flexible alarm mode.
Work flow
Through exporters, you can actively pull data (metrics) from the data source and save it to the time series database (TSDB). You can access it through HTTP Server, and at the same time, you can issue an alarm. For the time series data in the database, you can provide PromeQL to query and provide it to web UI or the visualization system Grafana.
Grafana
Official website (https://grafana.com/)
Open source data analysis and monitoring platform
Different dashboards supports different types of data visualization
Exporters
data acquisition
Prometheus pulls data from different exorters, and different exporter supports different data sources.
Node-exporter supports basic machine data such as cpu mem network, etc.
Third, the experimental environment docker01docker02docker03192.168.1.11192.168.1.13192.168.1.20NodeEXporterNodeEXporterNodeEXportercAdvisorcAdvisorcAdvisorPrometheus Server empty Grafana empty
Turn off all firewalls and disable selinux
Set up prometheus monitoring and alarm
Next, we need to start AlertManager to receive the alarm message sent by Prometheus and execute various alarms. Also start AlertManager in Docker mode, with the simplest startup command as follows:
$docker run-- name alertmanager-d-p 9093 prom/alertmanager:latest
Here, the default port of AlertManager is 9093. After startup, the browser can visit http://:9093 to see the UI page provided by default. However, there is no alarm message because we have not configured the alarm rules to trigger the alarm.
Configure AlertManager
AlertManager: used to receive the alarm information sent by prometheus, and execute the set alarm mode and alarm content.
Download image [root@docker01 ~] # docker pull alertmanager// download alertmanager image runs a container based on alertmanager [root@docker01 ~] # docker run-d-- name alertmanager-p 9093alertmanager 9093 prom/alertmanager:latest configuration Route forwarding [root@docker01 ~] # echo net.ipv4.ip_forward = 1 > > / etc/sysctl.conf [root@docker01 ~] # sysctl-p We need to modify its configuration file before deploying alertmanager, so let's run a container first Copy its configuration file first. [root@docker01 ~] # docker cp alertmanager:/etc/alertmanager/alertmanager.yml. / / copy the configuration file of alertmanager to modify the configuration file of alertmanager locally
AlertManager: used to receive the alarm information sent by Prometheus, and execute the set alarm mode and alarm content.
AlertManager.yml profile:
Global: global configuration, including timeout after alarm resolution, SMTP-related configuration, API address notified by various channels, and other messages.
Route: used to set the distribution policy for alerts.
Receivers: configure alarm message recipient information.
Inhibit_rules: suppresses rule configuration, which disables alerts that match when there is an alarm that matches another.
Modify configuration file [root@docker01 ~] # vim alertmanager.yml / / modify alertmanager configuration file global: resolve_timeout: 5m smtp_from: '2877364346roomqq.com' # your email address smtp_smarthost:' smtp.qq.com:465' # qq email address and port smtp_auth_username: '2877364346roomqq.com' smtp_auth_password: 'osjppnjkbuhcdfff' # need to obtain authorization from QQ Mail Code smtp_require_tls: false smtp_hello: 'qq.com'route: group_by: [' alertname'] group_wait: 5s group_interval: 5s repeat_interval: 5m receiver: 'email' # receiver changed to email receivers:- name:' email' email_configs:-to: '2877364346roomqq.com' send_resolved: true inhibit_rules:-source_match: severity: 'critical' Target_match: severity: 'warning' equal: [' alertname' 'dev',' instance']
After trying the above configuration over and over again, I found that the configuration of different environment parameters is also different, and various error reports occurred during debugging. Explain several key configurations:
1.smtp_smarthost: this is the SMTP service address of QQ Mail. The official address is smtp.qq.com port 465 or 587. At the same time, the POP3/SMTP service should be enabled.
Smtp_auth_password: here is the authorization code for the third party to log in to QQ Mail, which is not a QQ account login password, otherwise an error will be reported. The acquisition method will be prompted when you set the QQ Mail server to enable the POP3/SMTP service.
3.smtp_require_tls: whether to use tls or not, depending on the environment, choose whether to turn it on or off. If you are prompted to report an error email.loginAuth failed: 530 Must issue a STARTTLS command first, you need to set it to true. To highlight, if tls is enabled, the error starttls failed: x509: certificate signed by unknown authority is prompted. You need to configure insecure_skip_verify: true under email_configs to skip tls verification.
Rerun alertmanager container [root@docker01 ~] # docker rm-f alertmanager// delete alertmanager container [root@docker01 ~] # docker run-d-name alertmanager-v / root/alertmanager.yml:/etc/alertmanager/alertmanager.yml-p 9093docker rm 9093 prom/alertmanager:latest / / run a new alertmanager container Remember to mount the configuration file Prometheus configuration and alertmanager alarm rules to create a directory where rules are stored [root@docker01 ~] # mkdir-p prometheus/rules// create a rule directory [root@docker01 ~] # cd prometheus/rules/ write rules [root@docker01 rules] # vim node-up.rules groups:- name: node-up rules:-alert: node-up expr: up {job= "prometheus"} = = 0 # {job= "prometheus"} prometheus needs and The same for: 15s labels: severity: 1 team: node annotations: summary in line 23 of the prometheus configuration file: "{{$labels.instance}} has stopped running for more than 15s!"
To explain: the purpose of the rules is to monitor the survival of the node. Expr verifies whether the specific node job= "node-exporter" is alive for the PromQL expression. For indicates that the alarm status is Pending and then wait 15 seconds to change into the Firing state. Once it becomes the Firing state, the alarm will be sent to AlertManager,labels and annotations to add more identification information to the alert. All the added tag comments and the added label of the job in prometheus.yml will be automatically added to the email content. For more details on the configuration of rule, please refer to here.
Modify the prometheus configuration file [root@docker01 ~] # vim prometheus.yml # Alertmanager configuration # 7alerting: alertmanagers:-static_configs:-targets:-192.168.1.11 vim prometheus.yml 9093 # to comment and modify # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. # 14 Line rule_files:-"/ usr/local/prometheus/rules/*.rules" # add (this path is the path within the prometheus container)
Note: here rulefiles is the path within the container. You need to mount the local node-up.rules file to the specified path in the container. Modify the Prometheus startup command as follows, and restart the service.
Rerun prometheus Container [root@docker01 ~] # docker rm-f prometheus// Delete prometheus Container [root@docker01] # docker run-d-p 9090 docker rm 9090-- name prometheus-- net=host-v / root/prometheus.yml:/etc/prometheus/prometheus.yml-v / root/prometheus/rules/node-up.rules:/usr/local/prometheus/rules/node-up.rules prom/prometheus// run a new alertmanager container Remember to mount the rule file browser to verify http://192.168.1.11:9090/rules
It is explained here that there are three Prometheus Alert alarm states: Inactive, Pending, and Firing.
Inactive: inactive, indicating that it is being monitored, but no alarm has been triggered.
Pending: indicates that this alarm must be triggered. Because alerts can be grouped, suppressed / suppressed, or muted / muted, waiting for verification, and once all validations have passed, it will go to the Firing state.
Firing: sends the alert to AlertManager, which sends the alert to all recipients as configured. Once the alarm is off, change the status to Inactive, and so on.
Suspend docker02
You'll get an email.
Here are a few things that need to be explained:
The change of Alert status will not be found until 15s after each stop / restore of service, because the configuration of global-> scrape_interval: 15s in prometheus.yml determines. If you think the waiting time of 15s is too long, you can modify it less, globally or locally. For example, the waiting time of local modification node-exporter is 5s.
...-job_name: 'node-exporter' scrape_interval: 5s file_sd_configs:-files: [' / usr/local/prometheus/groups/nodegroups/*.json']
When the Alert state changes, it waits for 15s before it changes, because for: 15s state change waiting time is configured in node-up.rules.
After the alarm is triggered, the alarm mail will be sent automatically every 5m (during the period when the service does not return to normal), which is determined by the configuration of route-> repeat_interval: 5m in alertmanager.yml.
5. AlertManager Custom Mail template creation template directory [root@docker01 ~] # cd prometheus// enter the previously created prometheus directory [root@docker01 prometheus] # mkdir alertmanager-tmpl// create AlertManager template directory
Seeing the email template sent by default above, although all the core information has been included, the email format can be more elegant and intuitive. Then, AlertManager also supports custom mail template configuration. First, create a new template file.
Write template rules [root@docker01 prometheus] # vim email.tmpl {{define "email.from"}} 2877364346@qq.com {{end}} {{define "email.to"}} 2877364346@qq.com {{end}} {{define "email.to.html"}} {{range .Alerts}} = start=
Alarm program: prometheus_alert
Alarm level: {{.Labels.alarm}} level
Alarm type: {{.Labels.alertname}}
Failed host: {{.Labels.instance}}
Alarm topic: {{.Annotations.summary}}
Trigger time: {{.StartsAt.Format "2019-08-04 16:58:15"}}
= end=
{{end}} {{end}}
To explain briefly, the above template file is configured with three template variables: email.from, email.to and email.to.html, and the reference can be configured directly in the alertmanager.yml file. Here email.to.html is the content of the email to be sent, which supports Html and Text formats. Here, in order to look good, the message is simply displayed in Html format. The following {{range .Alerts}} is a circular syntax, which is used to iteratively obtain the information of the matching Alerts. The alarm information below is the same as the default email display information above, except that some core values are extracted to display. Then, you need to add the alertmanager.yml file templates configuration as follows:
Modify the alertmanager configuration file [root@docker01 ~] # vim alertmanager.yml global: resolve_timeout: 5m smtp_from: '2877364346roomqq.com' smtp_smarthost:' smtp.qq.com:465' smtp_auth_username: '2877364346roomqq.com' smtp_auth_password:' evjmqipqezlbdfij' smtp_require_tls: false smtp_hello: 'qq.com'templates: # add template- / etc/alertmanager-tmpl/*.tmpl' # add the path route: group_by: ['alertname'] group_wait: 5s group_interval: 5s repeat_interval: 5m receiver:' email' receivers:- name: 'email' email_configs:-to:' {{template "email to"}}'# modify html:'{{template "email.to.html ".}}'# add send_resolved: true # delete inhibit_rules:-source_match: severity: 'critical' target_match: severity:' warning' equal: ['alertname' 'dev',' instance'] rerun alertmanager container [root@docker01] # docker rm-f alertmanager// delete alertmanager container [root@docker01] # docker run-itd-- name alertmanager- p 9093purl 9093-v / root/alertmanager.yml:/etc/alertmanager/alertmanager.yml-v / root/prometheus/alertmanager-tmpl:/etc/alertmanager-tmpl prom/alertmanager:latest// run a new alertmanager container Remember to mount the configuration file
Suspend docker02
Receive an email
Of course, we can also configure the email title, which is not demonstrated here. Please refer to the detailed configuration here. In addition to monitoring the survival of nodes, you can also monitor many metrics, such as CPU load alarm, Mem usage alarm, Disk storage space alarm, Network load alarm, and so on. You can define a series of alarm rules by customizing PromQL expression verification values to enrich the various alarms needed in daily work.
The above is all the contents of the article "how to build Prometheus Monitoring alarm and Custom email template". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.