How to monitor the database of Prometheus 04/06 Update SLTechnology News&Howtos

How to monitor the database of Prometheus

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to carry out Prometheus database monitoring, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Problems faced by traditional Monitoring system of Database Monitoring based on Prometheus

What problems will the traditional monitoring system face?

Take zabbix as an example

For the first time, a large number of management functions need to be configured, and zabbix will be found with the growth of servers and business. this kind of traditional monitoring faces many problems.

DB performance bottleneck, because zabbix will store the collected performance indicators in the database, when the number of servers and business growth is rapid, the database performance first becomes the bottleneck.

Multiple sets of deployment, high management costs, when database performance becomes a bottleneck, the first way to think of may be multiple sets of zabbix deployment, but it will bring the problem of high management and high maintenance costs.

The ease of use is poor, and the configuration and management of zabbix is very complex and difficult to master.

Mail storm, mail configuration rules are very complex, carelessly may easily cause mail storm problems.

With the development of container technology, the traditional monitoring system is facing more problems.

How to monitor the container?

How to monitor microservices?

How to analyze and calculate the cluster performance?

How to manage a large number of configuration scripts on the agent side?

We can see that the traditional monitoring system can not meet the monitoring needs in the current IT environment.

The predecessor of Prometheus: Borgmon

In 2015, Google published a paper entitled "Google uses Borg to manage large-scale clusters"

This paper also describes the size and challenges of the Google cluster.

Tens of thousands of servers per cluster

Thousands of different applications

Hundreds of thousands of jobs, and dynamically increase or decrease

Hundreds of clusters per data center

Based on this scale, the monitoring system of Google is also faced with great challenges, and the Borgmon monitoring system in Borg is born to deal with these challenges.

Borgmon introduction

So let's take a look at how Google does the monitoring system for large-scale clusters.

Application burying point

First of all, all applications running in the Borg cluster need to expose a specific URL, http://:80/varz. Through this URL, we can get all the monitoring metrics exposed by the application.

Service discovery

However, there are tens of millions of such applications, and they may increase or decrease dynamically. How can these applications be found in Borgmon? Applications in Borg are automatically registered with the domain name server BNS within Borg when they are started. Borgmon collects the application list by reading the application list information in BNS, so as to find out which application services need to be monitored. When the application list is obtained, all the monitoring variable values of the application will be pulled to the Borgmon system.

Index collection and stacking

When the monitoring metrics are collected into the Borgmon, they can be displayed or provided for alarm use. In addition, because a cluster is too large, a Borgmon may not be able to meet the monitoring collection and display needs of the entire cluster, so generally, in some complex environments, a data center may deploy multiple Borgmon, which is divided into data collection layer and aggregation layer. In the data collection layer, a number of Borgmon are dedicated to the application to collect data, while the summary layer Borgmon acquires data from the data collection layer Borgmon.

Index data storage

After Borgmon collects the performance metrics data, it stores all the data in the in-memory database, periodically checkpoint to disk, and periodically packages it to the external system TSDB. Typically, at least 12 hours of data are stored in data centers and global Borgmon for rendering charts. Each data point takes up about 24 bytes of memory, so it stores 1 million time-series, one data point per minute for each time-series, and saves 12 hours of data, requiring only 17GB memory.

Query of indicators

In Borgmon, we can query metrics through tags. Based on tag filtering, we can query specific metrics of an application, and we can also query information of higher dimensions.

Filter information based on tags, for example, we query the http_requests of the app host0:80 based on a set of filtering information.

We can also query the http_requests of the entire western United States, where job is webserver.

So what you get at this time is a list of all the eligible instances.

Rule calculation

On the basis of data collection and storage, we can get further data through rule calculation.

For example, we want to call the police when the web server error rate exceeds a certain percentage, or when the non-200 return code accounts for more than a certain value of the total request.

Prometheus introduction

Borgmon is an internal system of Google, so how do you use it outside of Google? Here we refer to the Prometheus monitoring system that we describe. The book "Google SRE" written by SRE engineers inside Google directly mentions that Prometheus is the equivalent of an open source version of Borgmon. At present, Prometheus is also very popular in the open source community. The original Cloud Foundation (CNCF) of the Linux Foundation, sponsored by Google, has included Prometheus into its second largest open source project (the first project is Kubernetes, the open source version of Borg).

Architecture

The overall architecture of Prometheus is similar to Borgmon, with the following components, some of which are optional:

Prometheus master server for collecting and storing time series data

Application client code base

Push gateway of short-term jobs

Special-purpose exporter (including HAProxy, StatsD, Ganglia, etc.)

Alertmanager for alarm

Command line tool query

In addition, Grafana is an excellent tool to present as a Prometheus Dashboard.

Database monitoring

Based on the collection of database indicators based on Prometheus, we take MySQL as an example. Since MySQL does not expose the interface for collecting performance indicators, we can start a mysql_exporter separately, capture the performance indicators from the MySQL database through mysql_exporter, and expose that the performance collection interface is provided to Prometheus. In addition, we can start node_exporter to capture the performance indicators of the host.

Deployment server

For server configuration is very simple, because Prometheus is all based on Go language development, and Go language programs are very convenient in installation, installation server programs only need to download, decompress and run. You can see that there are few commonly used programs on the server side, only need to include prometheus, the main service program and alertmanager, the alarm system program.

Server configuration is also very simple. Common configurations include pull time and specific collection methods. As far as we monitor the mysql database, we only need to enter the mysql_exporter address.

Deploy the exporter side

For mysql collection, you only need to configure connection information and start mysql_exporter.

After the configuration is completed, mysql performance metrics can be collected through mysql_exporter.

Then we can also query the collected mysql performance indicators on the prometheus server.

Based on these collection metrics and the rule calculation statements provided by Prometheus, we can implement some query requirements at high latitudes, such as increase (mysql_global_status_bytes_received {instance= "$host"} [1h]).

We can query the number of bytes received by MySQL per hour, and then we put this query into Grafana to show a very cool performance chart.

At present, the MySQL monitoring scheme combining Prometheus and Grafana has been implemented with open source, so we can easily build a monitoring system based on Prometheus.

For alarm, we can also implement complex alarm logic based on Prometheus's rich query statements.

For example, we need to monitor the MySQL repository. If the replication IO thread is not running or the replication SQL thread is not running and sends an alarm for 2 minutes, we can use the following alarm rule.

# Alert: The replication IO or SQL threads are stopped. ALERT MySQLReplicationNotRunning IF mysql_slave_status_slave_io_running = = 0 OR mysql_slave_status_slave_sql_running = 0 FOR 2m LABELS {severity = "critical"} ANNOTATIONS {summary = "Slave replication is not running", description = "Slave replication (IO or SQL) has been down for more than 2 minutes.",}

For example, if we want to monitor the MySQL standby delay of more than 30 seconds and predict that it will last more than 0 seconds for 1 minute after the next 2 minutes, the alarm will be called.

# Alert: The replicaiton lag is non-zero and it predicted to not recover within # 2 minutes. This allows for a small amount of replication lag. ALERT MySQLReplicationLag IF (mysql_slave_lag_seconds > 30) AND on (instance) (predict_linear (mysql_slave_lag_seconds [5m], 60x2) > 0) FOR 1m LABELS {severity = "critical"} ANNOTATIONS {summary = "MySQL slave replication is lagging", description = "The mysql slave replication has fallen behind and is not recovering",}

Of course, there are not only MySQL monitoring implementations in the database, but also many other open source implementations in the industry, so it can also be used out of the box in database monitoring.

After reading the above, have you mastered how to monitor the database of Prometheus? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.