Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Is there any possibility of downtime when using a CVM?

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about the possibility of downtime when using a CVM. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

It is very likely that downtime is a phenomenon in which the operating system is unable to recover from a serious system error, or there is a problem at the system hardware level, so that the system does not respond for a long time and has to restart the computer. It is a common phenomenon in the operation of cloud servers, which can happen to any server.

Users can accurately find the situation of CVM outage and reduce false positives by doing the following.

1. Exception exclusion

Exclude non-physical machines and exclude the abnormal information generated by VM which is temporarily unconcerned in the system. Exclude machines in non-business state, such as those in installed state, including machines in production, migration, reinstallation, destruction, restart, no control status, and only monitor the normal state. Exclude machines that are not working, such as non-working state machines.

2. Network interference elimination

In the downtime analysis, more false positives are due to the interference of network problems, so it is impossible to accurately judge whether the physical machine is down, which may be a network problem. Eliminate false positives caused by abnormal network equipment, including computer room disconnection drills, small area network failures, and uplink network failures, such as detecting packet loss and using some logic to initially judge network problems.

The false positives of the server itself without packet loss need not only to filter out network problems, but also to filter out SA false positives through packet loss data analysis. SA anomalies will report abnormal heartbeats, which is misunderstood as downtime. Icmp and tcp packet loss analysis shows that the icmp acquisition frequency is fixed for a few seconds, and the tcp acquisition frequency is fixed for a few seconds, including the packet loss of several packets of different sizes (16mem32 ~ 64128256, etc.). According to the packet loss of the two data in the analysis time window.

3. Elimination of interference under special circumstances

Individual CVM rooms sometimes have large-area storm-like abnormal heartbeat and abnormal network ping packets, but the ping packets of uplink network devices are normal. This false alarm is generally analyzed according to the specific case. For example, according to the reporting frequency of each computer room, the interference is eliminated.

4. Further identify false positives

Most of the interference has been filtered out, but some false positives are still hidden. For example, abnormal heartbeat and abnormal ping are in line with the logic of downtime judgment, which will lead to downtime. If the network card is blown up, or the retry rate is high, this is due to the network exception caused by the business, but the business believes that it is not abnormal and needs to be excluded. The server does not hang up, but the IO latency and resource utilization indicators are not normal. It is necessary to increase uptime judgment and out-of-band log analysis.

5. Deal with the long tail again.

Unconfirmed ones will be added to the long tail list, such as a minute-level abnormal heartbeat and ping, but the serial log has been output normally, which is usually a situation in which the computer crashes and even the network is blocked. It will be observed for a period of time, and if it has not been restored or restarted within a fixed time window, the outage will be reported temporarily. This kind of crash will be classified and classified separately at a later stage.

The above is the possibility of downtime when using a CVM shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report