In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
The service heartbeat mechanism is mainly used to confirm the survival status of the service. The heartbeat data of UAVStack is also responsible for reporting the container and process monitoring data of the node. It supports real-time viewing of the running status of application containers and processes at the front end, and makes an early warning to containers and processes based on these data.
I. background
In the micro-service architecture, the service heartbeat is a simple but very important mechanism to confirm the survival status of the micro-service. Heartbeat in UAVStack is a Http request. MonitorAgent (hereinafter referred to as MA) sends a Http request with a specific message format to HealthManager (hereinafter referred to as HM) regularly to complete the sending process of a heartbeat. The heartbeat message contains the time stamp when it is sent, which is used to update the data status of the HM side.
Unlike a normal heartbeat, the heartbeat in UAVStack is also responsible for uploading application container and process monitoring data on the MA side. Each time the heartbeat is sent, there will be a scheduled task on the MA side to collect the basic information of the heartbeat of the application container where the MA is located, as well as the process data on the application container, and send it along with the heartbeat packet.
This paper will first introduce the basic heartbeat mechanism of UAVStack, and then explain in detail the data collection of application containers and processes.
II. Basic structure
There are many ways to achieve heartbeat, heartbeat can be initiated by the client or server, as long as the basic function of confirming survival is completed. But in the general implementation, we prefer the client to take the initiative to report to the server, because when the client increases gradually, simply through the polling of the server will cause the pressure on the server and affect the performance.
In the implementation of UAVStack, we also use this way to send heartbeat information to the server (HM) through the client (MA) to inform the HM of its own survival.
A heartbeat is performed by UAV's MA and HM:
MA generates heartbeat data regularly, carries the application container information, process information and service information of MA nodes, and reports them to HM through Http requests.
HM is responsible for storing the received heartbeat data in the Redis cache and scanning the heartbeat data regularly to confirm the survival status of the node. The monitoring information such as the accompanying application container will be temporarily stored in Redis, and the subsequent scheduled tasks of HM will eventually be stored in OpenTSDB to remove the disk. The overall architecture is as follows:
III. Basic heartbeat mechanism
The main process of heartbeat service is shown in the figure above, and its logic includes the following steps:
1) the scheduled heartbeat task of MA generates a hollow heartbeat data, and gives the heartbeat data to the container and process data collection task on the MA side.
2) the container and process data collection task on the MA side is responsible for generating the heartbeat data timestamp, collecting the node's application container, process monitoring data, the basic information of the node, the available service information of the node, etc. After the above process, the heartbeat data will contain the following:
Heartbeat timestamp: the time stamp when a node sends a heartbeat to facilitate subsequent HM comparison to determine the survival status of the node. Basic information of the application container: including node ID, name, hostname, IP, etc. Simple monitoring data of the application container: including CPU load, memory usage, hard disk usage, etc. Process information on the application container: including the pid of each process, resource usage, etc. Capability information of the node: includes the Feature features enabled on the node. The service information of the node: including the services that can be provided on the node and its access interface to realize the service discovery. (optional) Node information of child nodes: if MA and HM are deployed in different network segments, MA cannot push data to HM by direct Http. In this case, MA needs to send the data to the HM in its own network segment, and then the heartbeat client of the HM sends it to the total HM. In this case, the node information of the child node is carried in the reported data. Each node information contains the above data.
Finally, the heartbeat data is sent to HM.
3) after receiving the heartbeat data, the HM side stores it in its own Redis cache. Update the service status in Redis with the service information in the reported data for service discovery requests.
4) when the HM starts the heartbeat receiving service, it will start the heartbeat check task at the same time. This task will scan the heartbeat data in Redis regularly, judge the survival status of the heartbeat node according to the difference between the current system time and the heartbeat timestamp, update the status of the node, and delete the expired node.
4. Application container and process data collection
The heartbeat data of UAV not only completes the heartbeat function, but also reports the monitoring data of the node's application container and process.
The purpose of reporting application container and process data through Http is to ensure the isolation of application container monitoring data and application monitoring data. Through different ways of transmission, it can ensure that the collection of container and process data will not be affected when the MQ service cannot be used.
This section focuses on the details of the collection of these data.
4.1 Application container data acquisition
The data of the application container is divided into two parts:
One is the basic information of the container, that is, the ID, hostname, system information and JVM information of the node.
The other part is some simple real-time monitoring and collecting data, including CPU load, memory occupation and disk occupation and so on. Each time the heartbeat data is reported, these data are collected in real time from the following data sources:
System.getProperty after application startup: this part of the data mainly includes the basic information of the operating system, JVM information and so on. Tool class provided by Java: this part mainly includes network card information. Information obtained through JMX: including CPU footprint, memory footprint, etc. The information recorded by the system itself: this part includes the service information that can be provided, the Feature information started, the node ID and so on. Information obtained by executing system commands: including disk occupancy. Information obtained by directly reading files in the / proc directory: including CPU footprint, memory footprint, etc. 4.2 process data acquisition
Different from the application container data collection, the data of the process is not collected during the heartbeat process, but by a special Feature. In Feature, process data acquisition is further divided into process port traffic data collection and other data collection. Both of them are completed by scheduled tasks and cooperate with each other, and finally the process data of the heartbeat client is updated by the scheduled tasks detected by the process.
This way of using multiple acquisition tasks to collect separately can collect different frequencies for different data. For example, the collection of network port traffic can be carried out in a longer period to reduce the performance loss caused by data acquisition. At the same time, different tasks can also be executed with different threads to improve the efficiency of execution.
The process data collection process is roughly shown in the following figure:
The process port traffic detection timing task reads the port list of local variables at regular intervals to obtain the port number to be collected.
Then, for the Windows environment, the network card object is obtained by JPcap, and the tcp filter is set on the network card to count the port traffic in a period of time. For the Linux environment, it is directly obtained by calling the Python script to open the socket and analyzing the packets that have flowed.
After obtaining the traffic data on all ports, the task will hand over the collected data to the process data collection task, update its local variables, and set the timestamp for this collection.
The process detection timing task consists of a series of sub-tasks. At the beginning of the task, a data container with Map structure is prepared to store the collected process information. Each process is distinguished by pid as the key of Map.
The task first scans all processes to get pid and process ports. The scanned process will go through a filter to exclude the process that does not need to collect data, and then formally collect the data on each process.
For each process, the number of connections, CPU, memory usage, disk read and write data and network port traffic data are collected by running system commands. The network port traffic data is the local variable collected and updated by the port traffic detection task, and the process detection task will update the latest list of ports scanned to the local variable of the port traffic detection task.
If the application is deployed on a container, there will also be a corresponding container information collection. Finally, the process detection task updates the collected process data to the local variables of the heartbeat client, and is collected and reported together with the generation of each heartbeat data.
The process data is collected from the following data sources:
System commands: including CPU, memory, number of connections, etc. (top, etc.) / subdirectories of processes under the proc directory: including information such as CPU, memory, disk read and write, etc. Execution scripts: including port traffic data acquisition third-party toolkit under Linux environment: including port traffic data acquisition (JPcap) under win environment 5, HM processing
After the heartbeat data and container data are sent to the HM side through Http, they will be processed by the corresponding service on the HM side.
When starting, HM will start its own heartbeat client, which is responsible for sending the heartbeat data of the machine and collecting the monitoring data of the container where the HM is located. At the same time, a heartbeat service is started, which is responsible for receiving and processing all uploaded heartbeat and container data information.
After receiving a request for heartbeat data, the heartbeat service will determine whether the current HM is a Master node according to the configuration of the HM. If the HM is a Master node, the heartbeat service will take the reported data from the messages carried by the Http, obtain the services available in the reporting node to update the service discovery information, and then store the data in the backend Redis cache; if it is not a Master node, the data will be handed over to the local heartbeat client to be uploaded the next time the heartbeat is sent.
This design takes into account that there will be cross-computer rooms in large-scale monitoring, when the monitoring nodes are often not in the same network segment, and this problem can be solved by handing over the machines in the same network segment to the "gateway" of the border. At this time, HM acts as a "gateway".
When HM starts, it also starts a scheduled task, which is responsible for dealing with the survival of each node. The task regularly reads all the heartbeat data from Redis and checks the difference between the client timestamp and the current system timestamp in turn.
When the time exceeds a certain upload interval, the corresponding node survival status is changed. When more than twice the upload interval, it means that the node may die and be in the dying state. When more than twice the time interval, it means that the node is dead. When the interval exceeds three times, the heartbeat service deletes the cache record for that node.
The container and process data reported with the heartbeat will be stored in Redis along with the heartbeat data, and then read by other scheduled tasks of HM and sent to the early warning center for processing. Finally, the monitoring indicators are formatted into a specific structure and stored in OpenTSDB.
The container data and process data collected at the same time will provide the front-end AppHub viewing interface, as shown in the figure:
Click on each node on the page to view detailed node information, including node operating system information, JVM information, services provided and installed Feture, and so on. This is the part of the information reported with the heartbeat data mentioned earlier. As shown in the figure:
VI. Summary
Heartbeat is the basic but important mechanism of micro-service architecture. By sending heartbeat data regularly, MA nodes report their own survival status, so that HM can know the running status of the current system.
At the same time, the heartbeat data of UAVStack is also responsible for reporting the container and process monitoring data of the node. With the reporting of these data, HM can make an early warning to the monitored container and process, and can also see the running status of the application container and process in real time at the front end.
Official website: https://uavorg.github.io/main/
Open source address: https://github.com/uavorg
Author: Zhang Mingming
Source: Yixin Institute of Technology
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.