Example Analysis of Ceph monitor Fault recovery 02/13 Update SLTechnology News&Howtos

Example Analysis of Ceph monitor Fault recovery

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you an example of Ceph monitor fault recovery analysis, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

1 problem

Generally speaking, in the actual operation, the number of ceph monitor is 2n+1 (n > = 0), at least 3 on-line, as long as the normal number of nodes > = nyst1ceph paxos algorithm can ensure the normal operation of the system. So, for three nodes, you can only hang up one at a time. Generally speaking, the probability of dying two nodes at the same time is relatively small, but what if two nodes die at the same time?

If more than half of the monitor nodes of the ceph are down, the paxos algorithm will not be able to quorum properly, and the ceph cluster will block operations on the cluster until more than half of the monitor nodes are restored.

If there are not enough monitors to form a quorum, the ceph command will block trying to reach the cluster. In this situation, you need to get enough ceph-mon daemons running to form a quorum before doing anything else with the cluster.

So,

(1) if at least one of the two dead nodes can be restored, that is, the metadata of monitor is still OK, then you only need to restart the ceph-mon process. So, for monitor, it's best to run on a RAID machine. In this way, it is easier to recover even if the machine fails.

(2) what if the metadata of the two dead nodes are corrupted? The occurrence of this situation shows that the character is not good, and the RAID disks of the two machines are damaged at the same time. The administrator must have smashed the machine because the salary was too low. How to recover?

2 recovery

In fact, there is no other way to recover the failed node, but the metadata has been corrupted. Fortunately, there is also a node with normal metadata that can be recovered.

To add a monitor:

$ceph mon getmap-o / tmp/monmap # provides fsid and existing monitor addrs

$ceph auth export mon. -o / tmp/monkey # mon. Auth key

$ceph-mon-I newname-mkfs-monmap / tmp/monmap-keyring / tmp/monkey

So, as long as you get monmap, you can restore monitor.

For simulation, consider two monitor nodes, hang up one, and all operations to access the ceph through the network will be blocked, but the local socket of the monitor can still communicate.

However, what hurts is that monmap cannot be exported through socket. However, thanks to the monmaptool tool, we can generate it manually (note fsid):

# monmaptool-- create-- add vm2 172.16.213.134 add vm2 6789-- add vm3 172.16.213.135 purl 6789-- fsid eb295a51-ec22-4971-86ef-58f6d2bea3bf-- clobber monmap

Monmaptool: monmap file monmap

Monmaptool: set fsid to eb295a51-ec22-4971-86ef-58f6d2bea3bf

Monmaptool: writing epoch 0 to monmap (2 monitors)

Copy the mon key of the normal monitor node:

# cat / var/lib/ceph/mon/cluster1-vm2/keyring

[mon.]

Key = AQDZQ8VTAAAAABAAX9HqE0NITrUt7j1w0YadvA==

Caps mon = "allow *"

Then initialize:

# ceph-mon-- cluster cluster1-I vm3-- mkfs-- monmap / root/monmap-- keyring / tmp/keyring

Ceph-mon: set fsid to eb295a51-ec22-4971-86ef-58f6d2bea3bf

Ceph-mon: created monfs at / var/lib/ceph/mon/cluster1-vm3 for mon.vm3

Finally, start the failed node:

# ceph-mon-- cluster cluster1-I vm3-- public-addr 172.16.213.135VR 6789

Everything, OK!

The above is all the contents of the article "sample Analysis of Ceph monitor failure recovery". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.