What is the method of RGW Bucket Shard design and optimization 04/26 Update SLTechnology News&Howtos

What is the method of RGW Bucket Shard design and optimization

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the method of RGW Bucket Shard design and optimization". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Recovery of OSD services with excessive OMAP

When the OSD omap where the bucket index is located is too large, once an exception causes the OSD process to crash, it is necessary to "put out the fire" on the spot and restore the OSD service as quickly as possible, so there is the following article.

First determine the OMAP size of the corresponding OSD, which causes the OSD to spend a lot of time and resources to load levelDB data when it starts, causing the OSD to fail to start (suicide timeout). In particular, this kind of OSD startup requires a very large memory consumption, so be sure to reserve good memory. (the physical memory is about 40 GB, so you can't use swap on top)

Root@demo:/# du-sh / var/lib/osd/ceph-214/current/omap/22G / var/lib/osd/ceph-214/current/omap/ 2017-08-11 11 11 var/lib/osd/ceph-214/current/omap/22G 52 var/lib/osd/ceph-214/current/omap/ 46.601938 7f298ae2e700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f2980894700' had suicide timed out after 1800 > 2017-08-11 11 11 52race 46.605728 7f298ae2e700-1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check (ceph::heartbeat_handle_d*, const char*) Time_t) 'thread 7f298ae2e700 time 2017-08-11 11:52:46.601952common/HeartbeatMap.cc: 79: FAILED assert (0 = "hit suicide timeout") # adjust the osd timeout setting before committing suicide and starting the OSD service

Add to the ceph.conf configuration of the failed node

[osd] debug osd = 20 # adjust debug level [osd.214]... Filestore_op_thread_suicide_timeout = 7000 # set the corresponding osd timeout to prevent osd from committing suicide and starting the OSD process

Observation log

Tailf / home/ceph/log/ceph-osd.214.log

Start the service

/ etc/init.d/ceph start osd.214

One more top is launched at the backend to observe the resource consumption of the process. Currently, an OSD with an OMAP of about 16 GB requires about 37 GB of memory. The OSD process takes up a very high amount of memory and CPU during recovery, as shown in the following figure

Choose a machine to release memory

When you observe the following records in the log, you can start the memory release operation (or leave it to the end).

2017-08-11 15 load_pgs opened 0814. 551305 7f2b3fcab900 0 osd.214 29425 pgs

The command to free memory is as follows

Monitoring during the recovery of ceph tell osd.214 heap releaseOSD Services

After the above operations, osd will continue to recover Omap data, and the whole process is relatively long. You can open watch ceph-s for observation at the same time. Generally, the recovery rate is 14MB per second, and the recovery time estimation formula

Recovery time (in seconds) = total OMAP capacity / 14 Note: where the total OMAP capacity is obtained by the previous du command

The logs during the recovery are as follows

2017-08-11 15 lc 11VERV 25.049357 7f2a3b327700 10 osd.214 pg_epoch: 29450 pg [76.2b6 (v 29425 "5676261 lc 29425" 5676261] local-les=29449 nasty 4 ec=20531 les/c 29449Mab 29447 29448MB 28171) [70Eng 23214] rang 2 lpr=29448 pi=20532-2944735 Universe luod=0'0 crt=29425'5676261 lcod 0 active mould1] handle_message: 0x6511312002017-08-11 1514 11RZ 25.049380 7f2a3b327700 10 osd.214 pg_epoch: 29450 pg [76.2b6 (v 294252576261) Lc 29425 lpr=29448 pi=20532 5676260 (29296 luod=0'0 crt=29425'5676261 lcod 5672800 Eng 29425 5676261] local-les=29449 naughty 4 ec=20531 les/c 29449 Lex29447 29448 handle_push ObjectRecoveryInfo (6f648ab6/.dir.hxs1.55076.1.6/head//76@29425'5676261) Copy_subset: [], clone_subset: {}) ObjectRecoveryProgress (! first, data_recovered_to:0, data_complete:false, omap_recovered_to:0_00001948372.1948372.3 Omap_complete:false) 2017-08-11 1515 osd.214 pg_epoch 1115 pg 1125.049400 7f2a3b327700 10 osd.214 pg_epoch: 29450 pg [76.2b6 (v 29425 "5676261 lc 29425" 5676261 (29296 "5672800 Ensemble 29425" 5676261] local-les=29449 nasty 4 ec=20531 les/c 29449 use 29447 29448 MB 28171) [70mei 23214] rang 2 lpr=29448 pi=20532-29447Compact 35 luod=0'0 crt=29425'5676261 lcod 0 active maser 1] submit_push_data: Creating oid 6f648ab6/.dir.hxs1.55076.1.6/head//76 in the temp collection2017- 08-11 15 7f2a3b327700 11 dequeue_op 0x651131200 finish2017 25.123153 7f2a3b327700 10 osd.214 29450 dequeue_op 0x651131200 finish2017-08-11 15 15 dequeue_op 0x651131200 finish2017 11 15 dequeue_op 0x651131200 finish2017 25.138155 7f2b357a1700 5 osd.214 29450 tick2017-08-11 15 15 dequeue_op 0x651131200 finish2017 11 Ride 25.138186 7f2b357a1700 20 osd.214 29450 scrub_should_schedule should run between 0-24 now 15 = yes2017-08-11 15 Rich 11Rom 25.138210 7f2b357a1700 20 osd.214 29450 scrub_should_schedule loadavg 3.34 > = max Load too high2017-08-11 15 osd.214 11 load too high2017 25.138221 7f2b357a1700 20 osd.214 29450 sched_scrub load_is_low=02017-08-11 15 sched_scrub done2017 11 25.138223 7f2b357a1700 10 osd.214 29450 sched_scrub 76.2a9 high load at 2017-08-10 11 11 7f2b357a1700 359828: 99109.8 < max (604800 seconds) 2017-08-11 1515 do_waiters-- 25.138235 7f2b357a1700 20 osd.214 29450 sched_scrub done2017-08-11 1515 do_waiters-- Start2017-08-11 15 finish2017 1115 do_waiters 25.138239 7f2b357a1700 10 osd.214 29450 do_waiters-- finish2017-08-11 1515 osd.214 1125.163988 7f2aaef77700 20 osd.214 29450 share_map_peer 0x66b4e0260 already has epoch 294502017-08-11 1515 osd.214 11VOR 25.164042 7f2ab077a700 20 osd.214 29450 share_map_peer 0x66b4e0260 already has epoch 29450 2017-08-11 15VlV 11VX 25.268001 7f2aaef77700 20 osd.214 29450 share_map_peer 0x66b657a20 already has epoch 29450 2017-08-11 15RV 11Rich 25.268075 7f2ab077a700 20 osd.214 29450 share_map_peer 0x66b657a20 already has epoch 29450

When the corresponding PG status of OSD returns to normal, the following closing operation can be performed.

Wrap-up work

Clean up memory

After OSD completes data recovery, CPU will drop, but memory will not be freed, so you must use the previous command to free memory.

Adjust the log level

Ceph tell osd.214 injectargs "--debug_osd=0/5"

Delete the temporary new content in ceph.conf.

At this point, the three articles in the bucket shard section are finished.

This is the end of the content of "what is the method of RGW Bucket Shard Design and Optimization". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.