The handling method of abnormal heartbeat mechanism in ceph 07/09 Update SLTechnology News&Howtos

The handling method of abnormal heartbeat mechanism in ceph

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Ceph heartbeat mechanism exception handling method, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

There is a situation when deploying a ceph cluster. In a large-scale cluster, when there is a node network or an osd exception, mon does not mark the abnormal osd with down until 900s after mon discovers that the node's osd has not been updated before spreading the abnormal osd label down and update osdmap.

Phenomenon: when deploying a ceph cluster, there is a situation. When there is a node network or osd exception in a large-scale cluster, mon does not mark the abnormal osd with down for a long time. After 900s, mon finds that the node's osd has not been updated before it spreads the abnormal osd down and update osdmap. However, during this 900s, the client IO will continue to send messages to the abnormal osd, resulting in Io timeout and further affecting the last business.

Cause analysis:

We also see in the mon log that other osd establishing heartbeats with the abnormal osd report the exception of the osd to the mon, but mon does not have these osd down in a short period of time. After looking at some information about the Internet and books, I found the problem.

First, let's look at several related configuration items in the osd configuration:

(1) osd_heartbeat_min_peers:10

(2) mon_osd_min_down_reporters:2

(3) mon_osd_min_down_reporters_ratio:0.5

All of the above parameters can be viewed by ceph daemon osd.x config show on the ceph cluster node (x is the id of your cluster osd).

What is the cause of the problem?

When the cluster in the problem site is deployed, each osd will randomly select 10 peer osd as the object to establish the heartbeat, but in the mechanism of ceph, these 10 osd may not be guaranteed to be scattered on different nodes. Therefore, when there is an osd exception, there is a probability that the reporter of the osd down reported to mon does not meet the ratio=0.5, that is, the number of reporter is less than half of the number of host stored in the cluster, so the abnormal osd cannot be quickly marked down through the heartbeat activation mechanism between osd. The exception was not identified until 900s later when the osd pgmap was not updated (another mechanism, which can be seen as the final insurance for osd heartbeat survival mechanism), and spread through osdmap. And this 900s is often unacceptable for upper-level business.

However, this phenomenon rarely occurs for small clusters, such as a 3-node ceph cluster:

If the number of peer established with the other node osd is less than the osd_heartbeat_min_peers, then the osd will continue to choose to establish a heartbeat connection with its nearest osd (even if it is on the same node as itself. )

As for the osd heartbeat mechanism, some people on the Internet have summed up several requirements:

(1) timely: establishing a heartbeat osd can find other osd anomalies in seconds and report to monitor,monitor to take the osd mark down offline in a few minutes.

(2) appropriate pressure: do not think that the more peer, the better. Especially in practical application scenarios, the network links where osd listens and sends heartbeat messages are shared with public network and cluster network. Too many heartbeat connections will greatly affect the performance of the system. Mon has a separate way to maintain heartbeat with osd, but ceph keeps alive through the heartbeat between osd, which distributes this pressure to each osd, which greatly reduces the pressure on the central node mon.

(3) tolerate network jitter: after collecting the report from osd, mon will wait for several conditions periodically, instead of rashly marking osd down. These conditions include that the effective time of the target osd is longer than the threshold determined by a fixed amount of osd_heartbeat_grace and historical network conditions, whether the number of hosts reported reaches min_reporters and min_reporters_ratio, and whether the failure report has not been cancelled by the source reporter within a certain period of time.

(4) Diffusion mechanism: there are two kinds of implementation, mon actively diffuses osdmap, and another kind of inertia is that osd and client pick it up themselves. In order to let the abnormal information be perceived by client and other osd in time, the former implementation is generally better.

Summary and revelation:

Two directions can be changed.

(1) it is obviously unreasonable to take 0.5% of the number of cluster storage nodes as min_reporter_ratio in the original mechanism. This osd should be used to establish the heartbeat with the number of osd on the host (take the number of host), then the total number of host established by 0.5 * should be used as the judgment basis.

(2) in some scenarios, we will define some logical regions for data storage. By making use of the hierarchical structure of crush, for example, multiple logical regions are defined in a ceph cluster, and a shard or copy of data exists in only one logical region, then the scope for establishing heartbeat connections in the relevant osd needs to be concise and accurate.

At present, there will still be many problems with the osd heartbeat mechanism implemented by ceph. I wonder if there will be a new mechanism to replace the current mechanism. Let's wait and see.

This is the answer to the question about how to deal with the abnormal heartbeat mechanism of ceph. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.