What if pg object unfound appears in Ceph? 04/20 Update SLTechnology News&Howtos

What if pg object unfound appears in Ceph?

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

The purpose of this article is to share with you what to do when pg object unfound appears in Ceph. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Background

One node in the cluster is damaged, while another node is damaged by a disk.

2. Problems

Check the status of the ceph cluster and see that the configuration group pg 4.210 has lost a block

# ceph health detailHEALTH_WARN 481 objects misplaced 5647596 objects unfound (0.009%); 1 objects unfound 1882532 (0.000%) Degraded data redundancy: 965unfound objectsPG_DEGRADED Degraded data redundancy 5647596 objects degraded (0.017%), 1 pg degraded, 1 pg undersizedOBJECT_MISPLACED 481ap5647596 objects misplaced (0.009%) OBJECT_UNFOUND 11882532 objects unfound (0.000%) pg 4.210 has 1 unfound objectsPG_DEGRADED Degraded data redundancy: 965objects unfound 5647596 objects degraded (0.017%), 1 pg degraded, 1 pg undersized pg 4.210 is stuck undersized for 38159.843116, current state active+recovery_wait+undersized+degraded+remapped, last acting [2] 3, process 3.1, first make the cluster available for normal use.

If you look at pg 4.210, you can see that it now has only one copy.

# ceph pg dump_json pools | grep 4.210dumped all4.210 482 1 965 481 1 2013720576 3461 3461 active+recovery_wait+undersized+degraded+remapped 2019-07-100 09 active+recovery_wait+undersized+degraded+remapped 2019-07-100 09 active+recovery_wait+undersized+degraded+remapped 3414 53.693724 9027 "1835435 9027" 1937140 [6] 2 6368 "1830618 2019-07-07 01R 36R 16.289885 6368" 1830618 2019-07-07 01:36: 16.289885 lost two copies of ceph pg map 4.210osdmap e9181 pg 4.210 (4.210)-> up [26pc20 pg 2] acting [2] And the most important thing is that the master copy is also lost.

Because the min_size of the specified pool is 2 by default, the pool vms where 4.210 resides cannot be used properly.

# ceph osd pool stats vmspool vms id 4 965 min_size 1478433 objects degraded (0.065%) 481 objects misplaced 1478433 (0.033%) 1 grep vmspool 492811 objects unfound (0.000%) client io 680 B rd, 399 kB/s wr, 0 op/s rd, 25 op/s wr# ceph osd pool ls detail | grep vmspool 4 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 10312 lfor 0max 874 flags hashpspool stripe_width 0 application rbd

It directly affected some virtual machines, resulting in some virtual machines being rammed up and no response to the execution command.

For normal use, the min_size of the foresight vms pool is adjusted to 1

# ceph osd pool set vms min_size 1set pool 4 min_size to 13.2To try to recover the missing block from pg4.210

View pg4.210

# ceph pg 4.210 query "recovery_state": [{"name": "Started/Primary/Active", "enter_time": "2019-07-09 23 recovery_state 04recovery_state 31.718033", "might_have_unfound": [{"osd": "4" "status": "already probed"}, {"osd": "6", "status": "already probed"}, {"osd": "15" "status": "already probed"}, {"osd": "17", "status": "already probed"}, {"osd": "20" "status": "already probed"}, {"osd": "22", "status": "osd is down"}, {"osd": "23" "status": "already probed"}, {"osd": "26", "status": "osd is down"}]

Literally, the self-healing state of pg 4.210, it has probed osd4, 6, 15, 17, 20, 23 and down, while my osd22 and 26 have been moved out of the cluster.

According to the official website, we have learned that the osd of might_have_unfound has the following four states

Already probedqueryingOSD is downnot queried (yet)

Two solutions: roll back the old version or delete it directly

# ceph pg 4.210 mark_unfound_lost revertError EINVAL: pg has 1 unfound objects but we haven't probed all sources,not marking lost# ceph pg 4.210 mark_unfound_lost deleteError EINVAL: pg has 1 unfound objects but we haven't probed all sources,not marking lost

Error message. The undiscovered block in pg has not explored all the resources and cannot be marked as missing, that is, it will not be rolled back and cannot be deleted.

Guess that osd22 and 26 of down have not been explored, and the nodes that are just good or bad have been reinstalled. Add osd again.

The process of deleting and adding osd will not be discussed here.

When the addition is complete, view pg 4.210 again

"recovery_state": [{"name": "Started/Primary/Active", "enter_time": "2019-07-15 15 Started/Primary/Active 24purs 32.277667", "might_have_unfound": [{"osd": "4" "status": "already probed"}, {"osd": "6", "status": "already probed"}, {"osd": "15" "status": "already probed"}, {"osd": "17", "status": "already probed"}, {"osd": "20" "status": "already probed"}, {"osd": "22", "status": "already probed"}, {"osd": "23" "status": "already probed"}, {"osd": "24", "status": "already probed"}, {"osd": "26" "status": "already probed"}], "recovery_progress": {"backfill_targets": ["20", "26"]

You can see that all the resources are probed. Execute the fallback command at this time.

# ceph pg 4.210 mark_unfound_lost revertpg has 1 objects unfound and apparently lost marking

View cluster status

# ceph health detailHEALTH_OK

The min_size of the recovery pool vms is 2

# ceph osd pool set vms min_size 2set pool 4 min_size to 2 Thank you for reading! This is the end of the article on "what to do when Ceph appears pg object unfound". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.