Example Analysis of Ceph pg unfound processing process 04/26 Update SLTechnology News&Howtos

Example Analysis of Ceph pg unfound processing process

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the example analysis of the Ceph pg unfound processing process, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Today, I checked the ceph cluster and found that pg was missing, so I have this article.

1. View cluster status

[root@k8snode001] # ceph health detail HEALTH_ERR 1 objects unfound 973013 (0.000%); 17 scrub errors; Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair Degraded data redundancy: 1x 2919039 objects degraded (0.000%), 1 pg degraded OBJECT_UNFOUND 1x 973013 objects unfound (0.000%) pg 2.2b has 1 unfound objects OSD_SCRUB_ERRORS 17 scrub errors PG_DAMAGED Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair pg 2.2b is active+recovery_unfound+degraded, acting [14pr 22pr 4], 1 unfound pg 2.44 is active+clean+inconsistent, acting [14pr 821] pg 2.73is active+clean+inconsistent, acting [25t 14je 8] pg 2.80is active+clean+scrubbing+deep+inconsistent+repair, acting [4L8] pg 2.80 is active+clean+scrubbing+deep+inconsistent+repair 14] pg 2.83 is active+clean+inconsistent, acting, pg 2.ae is active+clean+inconsistent, acting, pg 2.c4 is active+clean+inconsistent, acting, acting, pg 2.da is active+clean+inconsistent, acting, acting, PG_DEGRADED Degraded data redundancy: 2919039 objects degraded (0.000%), 1 pg degraded pg 2.2b is active+recovery_unfound+degraded, acting, 1422 is active+recovery_unfound+degraded, 1 unfound

From the output, we found pg 2.2b is active+recovery_unfound+degraded, acting [14 unfound 22 Jol 4], 1 unfound.

Now let's take a look at pg 2.2b and take a look at this pg's thought message.

[root@k8snode001 ~] # ceph pg dump_json pools | grep 2.2b dumped all 2.2b 2487 1 10 1 9533198403 3048 3048 active+recovery_unfound+degraded 2020-07-23 008 grep 07.669903 10373 1037354 48370 1037315 14 [14 22 22] 14 [14 22 22] 14 [14 22 22] 14 10371 "5437258 2020-07-23 08 08 grep 06.637012 10371' 5437258 2020-07-23 0814 06.637012 0

You can see that there is only one copy of it now.

two。 View pg map

[root@k8snode001] # ceph pg map 2.2b osdmap e10373 pg 2.2b (2.2b)-> up [14 Lotte 22 Ling 4] acting [14 Jing 22 Ling 4]

From pg map, it can be seen that pg 2.2b is distributed to osd [14 and 22].

3. View storage pool statu

[root@k8snode001 ~] # ceph osd pool stats k8s-1 pool k8s-1 id 21 grep k8s-1 pool 1955664 objects degraded (0.000%) 1 ceph osd pool ls detail 651888 objects unfound (0.000%) client io 271 KiB/s wr, 0 op/s rd, 52 op/s wr [root@k8snode001 ~] # ceph osd pool ls detail | grep k8s-1 pool 2 'k8s Musi 1' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 88 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

4. Attempt to recover pg 2.2b missing parcels

[root@k8snode001 ~] # ceph pg repair 2.2b

If the repair is not successful all the time, you can check the specific information of stuck PG, focusing on recovery_state. The command is as follows

[root@k8snode001 ~] # ceph pg 2.2b query {". "recovery_state": [{"name": "Started/Primary/Active", "enter_time": "2020-07-21 14 Started/Primary/Active 17V 05.855923", "might_have_unfound": [], "recovery_progress": {"backfill_targets": [] "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": {"begin": "MIN", "end": "MIN", "objects": []} "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": {"pull_from_peer": [] "pushing": []}, "scrub": {"scrubber.epoch_start": "10370", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN" "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0,0", "scrubber.deep": false, "scrubber.waiting_on_whom": []}} {"name": "Started", "enter_time": "2020-07-21 14 enter_time 1715 04.814061"}], "agent_state": {}}

If repair cannot be fixed; two solutions, roll back the old version or delete it directly.

5. Solution

Rollback the old version [root@k8snode001 ~] # ceph pg 2.2b mark_unfound_lost revert and delete [root@k8snode001 ~] # ceph pg 2.2b mark_unfound_lost delete directly

6. Verification

I deleted it directly here, and then the ceph cluster rebuilt the pg. We'll see later. The pg status changes to active+clean.

[root@k8snode001 ~] # ceph pg 2.2b query {"state": "active+clean", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 11069, "up": [12,22,4]

Check the cluster status again

[root@k8snode001 ~] # ceph health detail HEALTH_OK above is all the content of this article "sample Analysis of Ceph pg unfound process". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.