How to choose the monitoring scheme for Docker container 07/19 Update SLTechnology News&Howtos

How to choose the monitoring scheme for Docker container

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article is about how to choose the monitoring scheme for Docker containers. The editor thinks it is very practical, so I hope you can learn something after reading this article. Let's take a look at it with the editor.

With the full dockerization of online services, it is important to monitor docker containers. The monitoring system of SA is the monitoring of physical machines. When a physical machine runs multiple containers, it is impossible for us to distinguish the resource usage of each container from a monitoring chart. In order to better monitor the operation of the container, what is more important is to collect a large number of runtime data needed by the subsequent container dynamic scheduling algorithm. After investigation, the container monitoring system is built based on CAdvisor + InfluxDB + Grafana.

1. Selection of container monitoring scheme

When investigating the container monitoring system, there are actually many choices, such as the docker stats command that comes with docker, Scout,Data Dog,Sysdig Cloud,Sensu Monitoring Framework,CAdvisor and so on. Through the docker stats command, you can easily see the CPU, memory and network traffic of all containers on the current host. But the disadvantage of docker stats command is that only all the containers of the current host are counted, and the monitoring data obtained is real-time, there is no place to store, and there is no alarm function.

Although Scout (link: https://scoutapp.com/) and Sysdig Cloud,Data Dog provide more perfect services, they are both managed services and charged, so they are not taken into consideration. Sensu Monitoring Framework (link: https://sensu.io/) is highly integrated and free, but deployment is too complex. Finally, we chose CAdvisor as the container monitoring tool. CAdvisor Google produces, the advantage is the open source product, the monitoring index is complete, the deployment is convenient, and has the official docker image. The disadvantage is that the integration is not high, and the data is saved locally for only 2 minutes by default. However, after the survey, it is found that we can add InfluxDB to store data, dock with Grafana display charts, and build a container monitoring system more conveniently. The effect of data collection and chart display is good, and there is almost no impact on system performance.

2. Container resource monitoring-CAdvisor2.1 deployment and operation

CAdvisor is a container resource monitoring tool, including container memory, CPU, network IO, disk IO and other monitoring, while providing a WEB page to view the real-time running status of the container. CAdvisor stores 2 minutes of data by default and is only for a single physical machine. However, CAdvisor provides many data integration interfaces to support integration such as InfluxDB,Redis,Kafka,Elasticsearch, and you can add the corresponding configuration to send monitoring data to these databases for storage.

Since CAdvisor has been containerized, it is easy to deploy and run. Execute the following command:

After running, you can open http://ip:8080 in the browser to view the container monitoring data of the host.

2.2 Integrated InfluxDB

As mentioned earlier, CAdvisor only stores the last 2 minutes of data on the machine by default. In order to persist the data and uniformly collect and display the monitoring data, you need to store the data in InfluxDB. InfluxDB is a temporal database, which is specially used to store timing-related data, which is very suitable for storing CAdvisor data. Moreover, CAdvisor itself provides an integration method for InfluxDB, and you can specify the configuration when you start the container. We use the management container to manage the CAdvisor, and the modified startup configuration is as follows. It mainly specifies the storage engine InfluxDB and the address of the HTTP API of the InfluxDB (here the domain name influxdb.service.consul of the self-built DNS is used to avoid exposing the external port), as well as the corresponding database and username password.

{"binds": ["/: / rootfs:ro", "/ var/run:/var/run:rw", "/ sys:/sys:ro", "/ home/docker/var/lib/docker/:/var/lib/docker:ro"], "image": "forum-cadvisor", "labels": {"type": "cadvisor"} "command": "- docker_only=true-storage_driver=influxdb-storage_driver_db=cadvisor-storage_driver_host=influxdb.service.consul:8086-storage_driver_user=testuser-storage_driver_password=testpwd", "tag": "latest", "hostname": "cadvisor- {{lan_ip}"}

Notice that we use our own forum-cadvisor image instead of the official cadvisor image to fix some problems with cadvisor and for administrative convenience.

2.3 problems with CAdvisor 1) Operation error reporting problem

When running the latest CAdvisor container, it is found that the container has the following error log:

This problem is caused by the failure to install the findutils tool.

2) Container memory data is not available

Debian does not enable CGroup Memory by default, and CAdvisor cannot count container memory data by default. You need to modify the GRUB startup parameters, modify the file / etc/default/grub, and add the following line:

GRUB_CMDLINE_LINUX= "cgroup_enable=memory"

Then update grub2 and restart.

3) error problem of network traffic monitoring data

After a period of time when CAdvisor was launched, Shun'an found that the network data of the container did not match the actual situation. After searching the information, it was found that the problem was because CAdvisor only counted the traffic of the first ENI by default, while there are multiple overlay networks in our container, so you need to count all the ENI traffic in the container. So I modified the CAdvisor statistics section of the network traffic code and recompiled a version for online use, the modified code is here.

Finally, our custom image file forum-cadvisor.Dockerfile looks like this (src/cadvisor is a modified and recompiled cadvisor executable):

2.4 introduction to the principle of CAdvisor

The CAdvisor runtime mounts multiple directories such as the host root directory and the docker root directory, from which you can read the runtime information of the container. The basic technologies of docker include Linux namespace,Control Group (CGroup), AUFS and so on, in which CGroup is used for system resource restriction and priority control.

The contents of CGroup are stored under the / sys/fs/cgroup/ directory of the host. CGroup includes multiple subsystems, such as blkio,cpu, memory, network IO and other restrictions on block devices. Docker creates the docker directory in various subsystems of CGroup, while the CAdvisor runtime mounts the host root directory and / sys directory, so that CAdvisor can read the resource usage records of the container.

For example, you can see the CPU usage statistics of the container b1f257 at the current time. For a detailed description of CGroup, please see DOCKER basic Technology: LINUX CGROUP (Link https://coolshell.cn/articles/17049.html)

# cat / sys/fs/cgroup/cpu/docker/b1f25723c5c3a17df5026cb60e1d1e1600feb293911362328bd17f671802dd31/cpuacct.statuser 95191system 5028

The container network traffic CAdvisor is read from / proc/PID/net/dev. For example, if the PID of the container b1f257 process on the host is 6748, you can see the traffic received and sent by all the container network cards, as well as the number of errors. CAdvisor periodically reads the data under the corresponding directory and sends it to the specified storage engine for storage, while the local storage engine stores the last 2 minutes of data by default and provides UI interface for viewing.

# cat / proc/6748/net/devInter- | Receive | Transmit face | bytes packets errs drop fifo frame compressed multicast | bytes packets errs drop fifo colls carrier compressed eth0: 6266314 512 000 0 22787 2920 000 0 0 eth2: 0 000 000 000 0 0 0

3. Container monitoring data storage-InfluxDB

InfluxDB (link: https://docs.influxdata.com/influxdb/v1.3/) is an open source distributed temporal database developed in the go language. It is especially suitable for timing type data storage. The container monitoring data collected by CAdvisor is stored in InfluxDB, and CAdvisor itself provides InfluxDB support, which is very convenient to integrate.

Since online services are dockerized, we also choose to run with containers and manage InfluxDB through the container management system. The core configuration of the container runtime is as follows: the database directory is mounted and the service registration configured with consul is configured. In this way, since CAdvisor and InfluxDB are in the same overlay subnet and do not need to open the port for external access, CAdvisor can connect to InfluxDB directly through influxdb.service.consul:8086.

In order to store CAdvisor data, you need to create the database in advance and configure the user name, password and related permissions. InfluxDB provides a set of influx CLI, which is very similar to mysql client. In addition, InfluxDB's database operation language InfluxQL is basically the same as SQL syntax. Enter the InfluxDB container and run the following command to create the database and user password and authorize it.

# influxConnected to http://localhost:8086 version 1.3.5InfluxDB shell version: 1.3.5 > create database cadvisor # # create a database cadvisor > show databasesname: databasesname----_internalcadvisor > CREATE USER testuser WITH PASSWORD 'testpwd' # # create a user and set a password > GRANT ALL PRIVILEGES ON cadvisor TO testuser # # authorize the database to a specified user > CREATE RETENTION POLICY "cadvisor_retention" ON "cadvisor" DURATION 30d REPLICATION 1 DEFAULT # # create a default data retention policy and set the save time for 30 days Copy is 1

After the configuration is successful, you can see that CAdvisor automatically creates the data table through InfluxDB's HTTP API and sends the data to InfluxDB for storage.

3.2 important concepts of InfluxDB

Influxdb has some important concepts: database,timestamp,field key, field value, field set,tag key,tag value,tag set,measurement, retention policy, series,point, here is a brief description: database: database, such as the database cadvisor created earlier. InfluxDB is not a CRUD database, but more like a CR-ud database, giving priority to adding and reading data rather than updating the performance of deleted data. Timestamp: timestamp, because InfluxDB is a time series database, and its data contains a column called time, which stores the time when the record was generated. For example, the time column in rx_bytes stores the timestamp. Fields: includes several concepts of field key,field value and field set. Field key is the field name, and in the rx_ by tables, the field name is value. Field value is a field value, such as 1785878163, 1359398, etc. Field set is a collection of fields, which is composed of field key and field value. For example, the set of fields in rx_bytes is as follows:

Value = 17858781633value

Tags: including tag key, tag value, tag set several concepts. Tag key is a tag signature, and container_name,game,machine,namespace,type is a tag in the rx_ by tables. Tag value is the value of the tag. Tag set is a collection of tags, made up of tag key and tag value. Tags are optional in InfluxDB, but tags are indexed. If a field is frequently used in a query, it is recommended to set it to a label rather than a field. The label is equivalent to an indexed column in a traditional database. Retention policy: data retention policy. The retention policy of cadvisor is cadvisor_retention, the storage is 30 days, and the copy is 1. A database can have multiple retention policies. Measurement: similar to a table viewed by traditional data, it is a collection of fields, labels, and time columns. Series: data sets that share the same retention policy,measurement and tag set.

3.3Features of InfluxDB

InfluxDB as a time series database, compared with the traditional database, it has many features, such as some unique functions and continuous query functions. More details on InfluxDB can be found in the official documentation.

Features: there are some aggregation functions such as FILL () to fill the data, INTEGRAL () to calculate the surface area covered by the field, SPREAD () to calculate the difference between the maximum and minimum values, STDDEV () to calculate the standard deviation of the field, MEAN () to calculate the average value, MEDIAN () to calculate the median, SAMPLE () function to be used for random sampling and DERIVATIVE () to calculate the data change ratio and so on.

Continuous query: InfluxDB's unique continuous query function can shrink sampling periodically and store the data from the original database into a designated new database or new data table, which is especially useful in the statistical collation of historical data.

4. Visualization of container monitoring data-Grafana

The monitoring data of the container is collected through CAdvisor and stored in InfluxDB, and then the problem of data visualization is left. After all, a visual diagram can easily and quickly see some of the problems with the container. The chart shows that I chose Grafana. Grafana is an open source data monitoring and analysis visualization platform, supporting a variety of data source configurations (supporting data sources including InfluxDB,MySQL,Elasticsearch,OpenTSDB,Graphite, etc.) and rich plug-ins and template functions, supporting chart access control and alarm. Grafana also runs as a container. The container startup configuration is as follows, mainly mounting the data and log directories of grafana, setting the administrator's password, and opening port 8888 as the access port of grafana:

After startup, you can configure the data source on the http://IP:8888/ page. An example is as follows:

After configuring the data source, you can add Panel to achieve data visualization. Grafana chart function is very powerful, in the configuration of data query statements is also very intelligent, data sources, data tables, data fields automatically prompt, and all the functions of InfluxDB have classification can be directly selected configuration. It is important to note that when configuring byte data (such as network card traffic rx_bytes and memory usage memory_usage), the unit should select the data (IEC) category.

It is feasible and simple to use CAdvisor+InfluxDB+Grafana to build a container resource monitoring system. These three components all run as containers, which is also in line with our concept that online services are all containers. At present, the monitoring system has been fully online, the operation is normal, and the data visualization effect is good. In addition to being used for visual monitoring, these data will also be used in system anomaly detection algorithms and container intelligent scheduling algorithms.

The above is how to choose the Docker container monitoring solution. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.