Example Analysis of Ceph heartbeat Mechanism 07/04 Update SLTechnology News&Howtos

Example Analysis of Ceph heartbeat Mechanism

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of the heartbeat mechanism of Ceph. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

1. Introduction to heartbeat

Heartbeat is used to detect whether the other node is fault or not, in order to find the fault node in time to enter the corresponding fault processing process.

Question:

There is a tradeoff between the fault detection time and the load brought by the heartbeat message.

If the heartbeat rate is too high, too many heartbeat messages will affect the performance of the system.

If the heartbeat rate is too low, it will prolong the time to find the failed node, which will affect the availability of the system.

The fault detection strategy should be able to do this:

Timely: when a node is abnormal, such as downtime or network outage, the cluster can perceive it within an acceptable time range.

Appropriate pressure: including the pressure on the node and the pressure on the network.

Tolerate network jitter: network delays occasionally.

Diffusion mechanism: the change of meta-information caused by the change of node survival state needs to spread to the whole cluster through some mechanism.

2. Ceph heartbeat detection

Ceph_heartbeat_mon.png

OSD reports to Monitor:

When an event occurs in OSD (such as failure, PG change).

Within 5 seconds of self-activation.

OSD is reported periodically to Monito.

OSD checks the partner OSD failure message in failure_queue.

Send a failure report to the Monitor, queue the failure message to the failure_pending, and remove it from the failure_queue.

When a heartbeat is received from the OSD in failure_queue or failure_pending, it is removed from both queues and told Monitor to cancel the previous failure report.

When reconnection to the Monitor network occurs, the error report in failure_pending is added back to failure_queue and sent to Monitor again.

Monitor statistics offline OSD

Monitor collects partner failure reports from OSD.

When the OSD failure pointed to by the error report exceeds a certain threshold, and there are enough OSD to report its failure, take the OSD offline.

5. Summary of Ceph heartbeat detection

Ceph uses partner OSD to report the failure node and Monitor to count the heartbeat from OSD to determine the OSD node failure.

Timely: partner OSD can detect node failure and report Monitor in seconds, and Monitor will take the failed OSD offline in a few minutes.

Appropriate pressure: due to the partner OSD reporting mechanism, the heartbeat statistics between Monitor and OSD is more like an insurance measure, so the interval between OSD sending heartbeats to Monitor can be up to 600 seconds, and the detection threshold of Monitor can be as long as 900 seconds. Ceph actually distributes the pressure of the central node to all the OSD in the process of fault detection, so as to improve the reliability of the central node Monitor, and then improve the scalability of the whole cluster.

Tolerate network jitter: after receiving OSD's report to its partner OSD, Monitor did not immediately take the target OSD offline, but periodically waited for several conditions:

The failure time of the target OSD is greater than the threshold dynamically determined by the fixed amount of osd_heartbeat_grace and historical network conditions.

Reports from different hosts reach mon_osd_min_down_reporters.

Failure reporting is not cancelled by the source OSD before the first two conditions are met.

Diffusion: Monitor, as a central node, does not attempt to broadcast notification of all OSD and Client after updating OSDMap, but lazily waits for OSD and Client to get it. In order to reduce Monitor pressure and simplify the interaction logic.

6. Heartbeat Settings 6.1 configure Monitor / OSD interaction

After you have finished configuring the initial Ceph, you can deploy and run the Ceph. When you execute a command, such as ceph health or ceph-s, Ceph's monitor will report the current status of the CEPH storage cluster. Ceph's monitor understands the relevant state of Ceph's storage cluster through the OSD daemon instance of each Ceph and the adjacent Ceph OSD daemon instance. If Ceph's monitor does not receive a report, or if it receives a report of changes in Ceph's storage cluster, Ceph's monitor updates the status of the CEPH cluster map.

Ceph provides reasonable default settings for Ceph's monitor / Ceph's OSD daemon interaction. However, you can override the default values. The following sections describe how to use Ceph's monitor to interact with Ceph's OSD daemon instance to achieve Ceph's storage cluster monitoring.

6.2. OSDS check heartbeat

Each Ceph's OSD daemon checks the heartbeat of other Ceph's OSD daemons every 6 seconds. The [OSD] section under Ceph's configuration file adds OSD osd heartbeat interval, or you can change the heartbeat interval by setting a value at run time. If the neighbor's Ceph's OSD daemon does not show a heartbeat during the 20-second grace period, the Ceph's OSD daemon may consider that the surrounding Ceph OSD daemon is dead and report to a Ceph's Monitor, which will update the CEPH cluster map. An OSD osd heartbeat grace can be set in the [OSD] section of the Ceph configuration file, or you can change the grace period by setting this value at run time.

6.3. OSDS reports failed OSD

By default, Ceph's OSD daemon must report to Ceph's monitor three times: another Ceph's OSD daemon has died before Ceph's Monitor acknowledges that the report Ceph's OSD daemon is down. Add osd min down reports setting in the [MON] section of Ceph's configuration file (prior to previous V0.62), or you can change the minimum number of hangs reported by OSD at run time by setting the value. By default, only one OSD daemon for Ceph is an OSD daemon for another Ceph that must be reported. You can change the number of OSD Daemones of the OSD daemon that reports the Ceph to the Ceph monitor, either by adding a mon osd min down reporters in the Ceph configuration file, or by setting the value at run time.

6.4. Gaze failed OSD report

If Ceph's OSD daemon cannot go with the OSD daemon defined in Ceph's configuration file (or cluster map), it will ping a Ceph monitor every 30 seconds for the latest copy of the cluster map. The [OSD] section under the configuration file of Ceph adds osd mon heartbeat interval settings, or you can change the monitoring heartbeat interval of Ceph by setting values at run time.

6.5. OSDS reports its status

If Ceph's OSD daemon does not report to Ceph's monitor, at least once every 120 seconds, Ceph's monitor will consider that Ceph's OSD daemon is dead. You can change the monitoring report interval for Ceph by adding osd mon report interval max to the [OSD] section of the Ceph configuration file, or by setting the value at run time. Ceph's OSD daemon attempts to report its status every 30 seconds. Add the osd mon report interval min s setting in the [OSD] section under the configuration file of Ceph, or you can change the OSD daemon reporting interval for Ceph at run time by setting the value.

7. Configure settin

When modifying heartbeat settings, you should include them in the [global] section of your configuration file.

7.1 Monitor MONITOR setting parameters indicate the minimum rate of OSD unhung of type default mon OSD min up ratioCeph before Ceph's OSD daemon is still set to hang, the lowest rate of double0.3mon OSD min in ratioCeph's OSD instance before Ceph's OSD daemon is still set out, the number of seconds double0.3mon osd laggy halflifelaggy estimates will rot before Ceph's OSD daemon is still out, int60 * weight of 60mon osd laggy weightlaggy estimated decay new sample double0.3mon osd adjust heartbeat grace if set to true Ceph will extend booltruemon osd adjust down out interval on the basis of laggy estimation if set to true,Ceph based on laggy estimation extended booltruemon osd auto mark inCeph will mark the OSD daemon of any booted Ceph as the storage cluster of OSD daemon Ceph that guides Ceph in CEPH storage cluster boolfalsemon osd auto mark auto out inCeph Auto-tagging booltruemon osd auto mark new in cephalosporins in the cluster will usher in the type of OSD that launches a new Ceph to guard the largest CRUSH unit Ceph in Ceph's storage cluster booltruemon osd down out subtree limit The Stringrackmon osd report timeout grace period will be automatically marked in seconds before the declaration of unresponsive Ceph OSD daemons 32-bit Integer900mon osd min down reportersCeph OSD daemons report the minimum number of 32-bit Integer1mon osd min down reportsCeph 32-bit Integer1mon osd min down reportsCeph guardians required for downward Ceph OSD daemons must be reported Another Ceph OSD daemon down the 32-bit Integer37.2 OSD setting parameter describes the type default OSD heartbeat address the heartbeat of the network address of a Ceph OSD daemon AddressThe host addressOSD heartbeat interval how long Ceph's OSD daemon and its peer (in seconds) 32-bit Integer6OSD heartbeat graceCeph's OSD when a daemon does not show a heartbeat Ceph's storage cluster thinks After the time of 32-bit Integer20OSD mon heartbeat intervalCeph's OSD daemon a Ceph monitor if it does not have a CEPH OSD daemon peer, how long 32-bit Integer30OSD mon report interval maxCeph's OSD daemon reported Ceph's monitor Ceph's monitor before that Ceph's OSD guarded the time in seconds the maximum 32-bit Integer120OSD mon report inteval min seconds for Ceph's OSD's guardian Ceph monitor To prevent Ceph's monitor from considering the minimum number of Ceph's OSD guardians 32-bit Integer5 (valid range: should be less than the maximum interval reported by OSD on Monday) OSD mon ACK timeout waits for Ceph monitor confirmation request statistics 32-bit Integer

This is the end of the article on "sample analysis of Ceph heartbeat mechanism". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.