What is the Placement Group status in Ceph 11/02 Update SLTechnology News&Howtos

What is the Placement Group status in Ceph

2025-11-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what is the status of Placement Group in Ceph", which is easy to understand and clear. I hope it can help you solve your doubts. Let me lead you to study and learn "what is the status of Placement Group in Ceph".

The states of Placement Group (PG) are:

Creating

Peering

Activating

Active

Backfilling

Backfill-toofull

Backfill-wait

Incomplete

Inconsistent

Peered

Recovering

Recovering-wait

Remapped

Scrubbing

Unactive

Unclean

Stale

Undersized

Down

Their implications, causes, consequences, and solutions (for abnormal states) are as follows: Creating

Meaning: PG is creating

Cause: when creating a pool, the state that occurs when creating a pg according to the specified number of pg, normal state

Consequence: none

Solution: no need to solve, one of the normal states

Peering

Meaning: interconnect PG to agree on the state of objects and metadata

Cause: when the pg is creating, it is interconnected, and the OSD that stores the copy of the configuration group agrees on the state of the objects and metadata in it.

Consequence: none

Solution: no need to solve, one of the normal states

Activating

Meaning: after completing the peering process, pg will solidify the previous results, wait for all pg to synchronize, and try to enter the active state

Cause: pg's preparation before entering active

Consequence: if it is stuck in this state for a long time, it will affect the inability of the PG to read and write, which in turn affects the availability of the entire pool.

Solution:

Stop all OSD where PG is located.

Backup pg data with ceph-object-tool

Use ceph-object-tool to delete an empty pg on the primary PG (do not delete it manually)

Import data again using ceph-object-tool

Manually assign ceph permissions to the pg directory

Finally restart osd

Active

Meaning: pg is active and can be read and written

Cause: normal state

Consequence: none

Solution: no need to solve, one of the normal states

Backfilling

Meaning: backfill status

Cause: this situation is usually due to osd's offline (no heartbeat response for more than 5 minutes), and ceph is looking for a new osd to replace the full data copy.

Consequence: when this state occurs, it is usually confirmed that osd is dead or offline.

Solution: in most cases, ceph will automatically complete the data backfilling. If the backfilling cannot be completed, it will enter the backfill-toofull state.

Backfill-toofull

Meaning: backfilling pending statu

Cause: usually caused by insufficient osd capacity to backfill missing osd

Consequence: the pool cannot be written and the read and write are stuck.

Solution:

Need to check the osd capacity, whether there is a serious imbalance, manual evacuation of excess osd data (reweight). If it is a cluster nearful phenomenon, physical capacity should be expanded as soon as possible.

Emergency capacity expansion (the best way is to expand the number and capacity of osd)

Pause osd read and write:

Ceph osd pause

Notify mon and osd to modify the full threshold

Ceph tell mon.* injectargs "--mon-osd-full-ratio 0.96" ceph tell osd.* injectargs "--mon-osd-full-ratio 0.96"

Notify PG to modify the full threshold:

Ceph pg set\ _ full\ _ ratio 0.96

Unblock osd from reading and writing:

Ceph osd unpauseBackfill-wait

Meaning: PG is waiting for the backfill operation to begin.

Cause: caused offline by OSD (the status was not captured personally, maybe it was too fast to see it)

Consequence: next, theoretically, pg will enter the backfilling state for data backfilling.

Solution: normal backfilling must be done without special attention.

Incomplete

Meaning: it is found that the data status cannot be agreed upon during the peering process.

Cause: when pg selects the authoritative log, the authoritative log cannot be completed, or the logic of comparing the authoritative log with the local log is abnormal.

Consequence: it usually causes the pg not to be created and stuck in the creating+incomplete state, which in turn causes the pool to fail to use

Solution:

First of all, confirm that osd_allow_recovery_below_min_size is true, and whether the number of replicas is reasonable, and whether the number of selected osd configured by crushmap is the same as that of pool. If all are normal, try the following recovery process

Stop every osd corresponding to all incomplete's PG

Using ceph-object-tool to mark complete osd

Then restart osd

Inconsistent

Meaning: in fact, it means that the copy data is inconsistent.

Cause: a copy of data is lost for an unknown reason

Consequence: inconsistency of copy data leads to reduced security

Solution:

Using the ceph pg repair tool for data repair, it can generally be restored to normal, if it cannot be restored.

First increase the osd_max_scrubs of the three copies of osd, then use the ceph pg repair tool to repair the data again, and then adjust the osd_max_scrubs back to 1.

Peered

Meaning: in search, it means that PG cannot find enough copies to read and write (even if min_size is not satisfied)

Cause: multiple osd failed, resulting in the current number of active osd copies

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.