Pve+ceph Super Fusion (7)-remove and add bad hard drives 04/16 Update SLTechnology News&Howtos

Pve+ceph Super Fusion (7)-remove and add bad hard drives

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

In the course of use, when the cluster hard disk fails, what will replace the new hard disk? let's demonstrate below.

View ceph status

Root@pve-1:~# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.43822 root default

-3 0.14607 host pve-1

0 hdd 0.04869 osd.0 up 1.00000 1.00000

3 hdd 0.04869 osd.3 up 1.00000 1.00000

7 hdd 0.04869 osd.7 up 1.00000 1.00000

-5 0.14607 host pve-2

2 hdd 0.04869 osd.2 up 1.00000 1.00000

4 hdd 0.04869 osd.4 up 1.00000 1.00000

6 hdd 0.04869 osd.6 up 1.00000 1.00000

-7 0.14607 host pve-3

1 hdd 0.04869 osd.1 up 1.00000 1.00000

5 hdd 0.04869 osd.5 up 1.00000 1.00000

8 hdd 0.04869 osd.8 up 1.00000 1.00000

It's all normal. I'll remove a hard drive later.

Ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.43822 root default

-3 0.14607 host pve-1

0 hdd 0.04869 osd.0 up 1.00000 1.00000

3 hdd 0.04869 osd.3 up 1.00000 1.00000

7 hdd 0.04869 osd.7 up 1.00000 1.00000

-5 0.14607 host pve-2

2 hdd 0.04869 osd.2 up 1.00000 1.00000

4 hdd 0.04869 osd.4 up 1.00000 1.00000

6 hdd 0.04869 osd.6 up 1.00000 1.00000

-7 0.14607 host pve-3

1 hdd 0.04869 osd.1 up 1.00000 1.00000

5 hdd 0.04869 osd.5 up 1.00000 1.00000

8 hdd 0.04869 osd.8 down 1.00000 1.00000

Osd.8 has become down.

Then we simulate deleting and adding hard drives.

Ceph osd out osd.8

Ceph auth del osd.8

Ceph osd rm 8

Root@pve-1:~# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.43822 root default

-3 0.14607 host pve-1

0 hdd 0.04869 osd.0 up 1.00000 1.00000

3 hdd 0.04869 osd.3 up 1.00000 1.00000

7 hdd 0.04869 osd.7 up 1.00000 1.00000

-5 0.14607 host pve-2

2 hdd 0.04869 osd.2 up 1.00000 1.00000

4 hdd 0.04869 osd.4 up 1.00000 1.00000

6 hdd 0.04869 osd.6 up 1.00000 1.00000

-7 0.14607 host pve-3

1 hdd 0.04869 osd.1 up 1.00000 1.00000

5 hdd 0.04869 osd.5 up 1.00000 1.00000

8 hdd 0.04869 osd.8 DNE 0

The status of osd.8 is DNE

Delete the ceph disk of the failed node as follows:

Ceph osd crush rm osd.8

Root@pve-1:~# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.38953 root default

-3 0.14607 host pve-1

0 hdd 0.04869 osd.0 up 1.00000 1.00000

3 hdd 0.04869 osd.3 up 1.00000 1.00000

7 hdd 0.04869 osd.7 up 1.00000 1.00000

-5 0.14607 host pve-2

2 hdd 0.04869 osd.2 up 1.00000 1.00000

4 hdd 0.04869 osd.4 up 1.00000 1.00000

6 hdd 0.04869 osd.6 up 1.00000 1.00000

-7 0.09738 host pve-3

1 hdd 0.04869 osd.1 up 1.00000 1.00000

5 hdd 0.04869 osd.5 up 1.00000 1.00000

Osd.8 can't be found.

Indicates that the deletion was successful.

View the hard drive information we added

Lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

Sda 8:0 0 20G 0 disk

├─ sda1 8:1 0 1007K 0 part

├─ sda2 8:2 0 512M 0 part

└─ sda3 8:3 0 19.5G 0 part

├─ pve-swap 253:0 0 2.4G 0 lvm [SWAP]

├─ pve-root 253:1 0 4.8G 0 lvm /

├─ pve-data_tmeta 253:2 0 1G 0 lvm

│ └─ pve-data 253:4 0 8G 0 lvm

└─ pve-data_tdata 253:3 0 8G 0 lvm

└─ pve-data 253:4 0 8G 0 lvm

Sdb 8:16 0 50G 0 disk

├─ sdb1 8:17 0 100M 0 part / var/lib/ceph/osd/ceph-1

└─ sdb2 8:18 0 49.9G 0 part

Sdc 8:32 0 50G 0 disk

├─ sdc1 8:33 0 100M 0 part / var/lib/ceph/osd/ceph-5

└─ sdc2 8:34 0 49.9G 0 part

Sdd 8:48 0 50G 0 disk

Sr0 11:0 1 655.3M 0 rom

Pveceph createosd / dev/sdd

Root@pve-3:~# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 0.43822 root default

-3 0.14607 host pve-1

0 hdd 0.04869 osd.0 up 1.00000 1.00000

3 hdd 0.04869 osd.3 up 1.00000 1.00000

7 hdd 0.04869 osd.7 up 1.00000 1.00000

-5 0.14607 host pve-2

2 hdd 0.04869 osd.2 up 1.00000 1.00000

4 hdd 0.04869 osd.4 up 1.00000 1.00000

6 hdd 0.04869 osd.6 up 1.00000 1.00000

-7 0.14607 host pve-3

1 hdd 0.04869 osd.1 up 1.00000 1.00000

5 hdd 0.04869 osd.5 up 1.00000 1.00000

8 hdd 0.04869 osd.8 up 1.00000 1.00000

Delete the physical node from the ceph cluster as follows:

Ceph osd crush rm pve-3

Remove a failed node from a cluster

Pvecm delnode pve-3

Recovery operation of faulty machine

It's best to kill them all, reinstall the system, and join the cluster with a new ip address.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.