How to ensure the consistency of ceph IO under error condition 07/06 Update SLTechnology News&Howtos

How to ensure the consistency of ceph IO under error condition

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to ensure the consistency of ceph IO under error conditions. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Recently, we have studied how to ensure the consistency of ceph IO in the event of an error. The code is based on the hammer0.94.5 version. Build a cluster, including three OSD, osd.0 osd.1 osd.2.

Send an IO write operation from the client. 0 is primary and osd1/2 is replica. Suppose osd.2 is due to a network or hard disk failure, or the software bug is down. At this time, osd0 has finished writing locally, received a copy of osd.1 to write ack, and is still waiting for a copy of osd.2 to write ack. After waiting for the longest osd_heartbeat_grace, which defaults to 20 seconds, the heartbeat mechanism will be used to report to monitor that the osd is dead. At this point, the cluster will enter the peering, and in the case of peering, the affected pg,io will be occupied by block. At this point, take a closer look at how osd.0, as a primay, handles the write operation A. Before peering, osd0 calls void ReplicatedPG::on_change () and further calls apply_and_flush_repops ()

Apply_and_flush_repops () will requeue operation An into op_wq.

After waiting for the pg peering to complete, the pg to which the An operation belongs becomes the active state, IO continues, and do_op continues to process the IO queue, including the An operation of requeue.

Do_op will query pglog and find that Operation An is actually a dup operation and can be returned directly to client.

For both osd, such as osd.1/osd.2, primary will still operate requeue A, but if pg min_size is 2, only primay osd is online at this time, which is smaller than min_size.

Therefore, when the peering is completed, the IO will also be occupied by the block and wait for the data to be restored to more than min_size before the IO will continue.

Similarly, if you hang up primary osd.0, there are two cases. One is that osd.0 hangs first, and then client sends An operation, and the client side will wait for a while. After peering completes and client gets the updated osdmap, resend the request, wondering that the remaining IO processing is the same as normal.

The second is that client has sent the request to primary osd.0, osd.0 has also sent the copy write operation to osd.1 and osd.2, and then osd.0 has failed. Also wait until the heartbeat detects osd.0 hanging, and then peering. Osd.1 will also have the action of equeue, waiting for peering to complete, assuming that osd.1 becomes primary, then the logic is the same as the previous primary osd.0 action.

This is the end of this article on "how to ensure the consistency of ceph IO in error situations". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.