Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Stupid allocator for BlueStore source code analysis

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Preface

After introducing BlueStore's BitMap allocator, we know that the advantage of the new version of Bitmap allocator is that it uses contiguous memory space to hit CPU Cache as much as possible to improve allocator performance. Here we take a look at the Stupid allocator based on the interval tree (similar to the Linux Buddy memory management algorithm) and compare its advantages and disadvantages.

Directory partner algorithm data structure initialization insert delete space allocation space recovery analysis partner algorithm

In order to respond to requests quickly, improve memory utilization and reduce external memory fragmentation as much as possible, Linux memory management algorithm introduces the partner system algorithm Buddy-System. The algorithm groups all free pages into 11 linked lists, each of which contains 1, 2, 4, 8, 16, 32, 64, 128,256,512 and 1024 consecutive page frame blocks. the physical address of the first memory page of each page frame block is an integral multiple of the block size. The characteristics of partners are that the two blocks are of the same size, the addresses of the two blocks are continuous, and the physical address of the first page frame of the first block is an integral multiple of the total size of the two blocks (both belong to the same large block, blocks 1 and 2 are partners, and blocks 3 and 4 are partners, but blocks 2 and 3 are not partners). Specific memory allocation and memory release can be self-Google.

Advantages:

A better solution to the problem of external fragments can not be completely solved. Designed for large memory allocation, continuous memory can be allocated quickly.

Disadvantages:

The requirements of the merger are so strict that only the blocks that meet the partnership can be merged. Only one page in a continuous piece of memory is occupied, so that the whole memory does not have the conditions for merging. The page continuity of the algorithm is poor, and DMA may fail to apply for large chunks of continuous physical memory space, so CMA (Contiguous Memory Allocator, continuous memory allocator) is required. Waste of space can be solved through slab, kmem_cache and so on. Data structure

Stupid allocator uses interval tree to organize data structure and manage Extent (offset, length) efficiently.

Class StupidAllocator: public Allocator {CephContext* cct; / / Mutex std::mutex lock; / / Total size of free space int64_t num_free; / / the location of the last allocated space uint64_t last_alloc; / / interval tree array. When initializing, the length of the free array is 10, that is, there are ten interval trees std::vector free. / / extent: offset, length typedef mempool::bluestore_alloc::pool_allocator

< pair>

Allocator_t; / / orderly btree map, according to the order to store extent. Typedef btree::btree_map interval_set_map_t; / / interval tree, the main operations are insert, erase and so on. Typedef interval_set interval_set_t;}

The subscript of each interval tree is index (0-9), and the space size of index (1-9) is [2 ^ (index-1) * bdev_block_size, 2 ^ (index) * bdev_block_size).

Free [0]: [0,4k) free [1]: [4k, 8k) free [2]: [8k, 16k) free [3]: [16k, 32k) free [4]: [32k, 64k) free [5]: [64k, 128k) free [6]: [128k, 256k) free [7]: [256k, 512k) free [8]: [512k, 1024k) free [9]: [1024k, 2048k) initialization

After initializing the Stupid allocator, the caller adds or removes free space to the Allocator.

/ / increase free space void StupidAllocator::init_add_free (uint64_t offset, uint64_t length) {std::lock_guard l (lock); / / insert free space _ insert_free (offset, length) into free; / / update free space size num_free + = length } / / Delete free space void StupidAllocator::init_rm_free (uint64_t offset, uint64_t length) {std::lock_guard l (lock); interval_set_t rm; rm.insert (offset, length); for (unsigned I = 0; I

< free.size() && !rm.empty(); ++i) { interval_set_t overlap; overlap.intersection_of(rm, free[i]); // 删除相应空间 if (!overlap.empty()) { free[i].subtract(overlap); rm.subtract(overlap); } } num_free -= length; // 更新可用空间}插入删除 区间树实现代码: https://github.com/ceph/ceph/blob/master/src/include/interval_set.h insert函数代码: https://github.com/ceph/ceph/blob/master/src/include/interval_set.h#L445 erase函数代码: https://github.com/ceph/ceph/blob/master/src/include/interval_set.h#L516 最核心的实现是向区间树中插入以及删除区间,代码如下: 区间树插入Extent// 根据区间的长度,选取将要存放的区间树,长度越大,bin值越大。unsigned StupidAllocator::_choose_bin(uint64_t orig_len){ uint64_t len = orig_len / cct->

_ conf- > bdev_block_size; / / cbits = (sizeof (v) * 8)-_ _ builtin_clzll (v) / / _ _ builtin_clzll returns the number of leading zeros / / the result of cbits is the subscript of the highest bit 1 (starting from 0). The larger the len, the greater the value int bin = std::min (int) cbits (len), (int) free.size ()-1; return bin } void StupidAllocator::_insert_free (uint64_t off, uint64_t len) {/ / calculate which interval tree the free space belongs to: unsigned bin = _ choose_bin (len); while (true) {/ / insert the interval tree free[ bin] .insert (off, len, & off, & len); unsigned newbin = _ choose_bin (len) After if (newbin = = bin) break; / / inserts the data, the interval may be merged, resulting in an increase in interval length, and the bin may be adjusted. At this time, you need to delete the old one, and then insert the new bin / / interval. There are two cases of merging: one is merging in front of the original interval. But merged behind the original interval. Free [bin] .erase (off, len); bin = newbin;}}

Reviewing the first section of the partner algorithm, there is a difference between the two ways of merging:

Partner algorithm requirements are relatively strict, refer to the first section. Stupid Extent merge is relatively loose, as long as it satisfies that two Extent spaces are continuous. Delete Extent from interval tree

It is relatively simple to delete the Extent in the interval tree. Delete the incoming Extent in the original Extent, and then calculate whether the final Extent falls into another interval tree. If so, delete it from the interval tree and add a new interval tree.

Space allocation

The function for space allocation is defined as follows:

Allocate (uint64_t want_size, uint64_t alloc_unit, uint64_t max_alloc_size, int64_t hint,PExtentVector* extents); allocate_int (uint64_t want_size, uint64_t alloc_unit, int64_t hint, uint64_t* offset, uint32_t* length)

Where hint is a very important parameter, indicating that the assigned starting address should be larger than the value of hint as far as possible.

The core process consists of four layer 2 for loops: priority is given to allocating contiguous space greater than or equal to want_size from the hint address to the high-level interval tree, and if not, priority is given to allocating contiguous space greater than or equal to alloc_unit from the hint address to the lower-level interval tree (the length will be greater than alloc_unit).

The simple space allocation diagram is as follows:

The detailed flow chart of space allocation is as follows:

Space recovery

The function released by the space is defined as follows:

Release (const interval_set & release_set)

The process is simple: first add a lock, and then call _ insert_free to insert it into the corresponding interval tree, which will involve the merging of adjacent free space, but will lead to the problem of allocating space debris.

Pros and cons analysis CPU Cache

The bottom layer of Stupid uses BtreeMap to store a series of Extent, and the memory is not necessarily continuous. At the same time, when allocating space to traverse the interval tree, although the Extent in the interval tree is ordered, because the memory is not necessarily continuous or the memory span of the two adjacent Extent may be very large, it will cause the CPU-Cache to pre-read the next Extent, thus can not make good use of CPU-Cache.

The Bitmap allocator initializes three layers at the time of BlueStore initialization, and the size is fixed, and the allocation space is allocated sequentially, so that the function of CPU-Cache can be fully utilized. So as to improve the performance of the distributor.

Pseudo space debris

The Extent-based Stupid allocator has the problem of pseudo-space debris (the physical space is continuous, but not in the allocator):

After 6 times of 4K allocation and 6 times of disordered 4K release, a 24K continuous space may become 8K + 4K + 8K + 4K.

Because of the same size as the surrounding blocks, two 4K intervals fall into different interval trees, which makes it difficult to be merged, and the 24K continuous space becomes four discontinuous spaces.

Because the Bitmap allocator allocates all three layers of memory at initialization, and the three layers are orderly and the allocation space is sequentially traversed, it is OK to set the corresponding bits when releasing the space, which does not affect the continuity, so this problem does not exist.

According to the performance comparison experiment of the author of Bitmap, the Bitmap allocator is better than the Stupid. When the Bitmap is stable, the default allocator of BlueStore can be set to Bitmap.

Author: Shi Mingya

If you register Didi Yun now, you can get a 30-yuan Didi coupon without a threshold.

50% discount for newly purchased cloud services in January, 4.5% discount in March and 60% discount in June.

Didi Yun messenger recruits, recommending up to 50% rebate

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 248

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report