How to implement slab cache in Linux memory management 07/19 Update SLTechnology News&Howtos

How to implement slab cache in Linux memory management

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "how to achieve slab cache in Linux memory management". The content is detailed, the steps are clear, and the details are handled properly. I hope that this article "how to achieve slab cache in Linux memory management" can help you solve your doubts.

The Linux kernel uses a method derived from Solaris, but this method has been used in embedded systems for a long time. It allocates memory as objects according to size, which is called slab cache.

The goal of memory management is to provide a way to achieve memory sharing among users for a variety of purposes. The memory management approach should implement the following two functions:

Minimize the time required to manage memory

Reduce available memory for general applications (minimize administrative overhead)

Memory management is actually a zero-sum game about tradeoffs. You can develop an algorithm that uses a small amount of memory for management, but takes more time to manage available memory. You can also develop an algorithm to manage memory effectively, but use more memory. In the end, the requirements of a particular application will prompt a choice of this trade-off.

Each memory manager uses a heap-based allocation strategy. In this approach, large chunks of memory, called heaps, are used to provide memory for user-defined purposes. When users need a piece of memory, they request to allocate a certain amount of memory to themselves. The heap manager looks at the available memory (using a specific algorithm) and returns a piece of memory. Some of the algorithms used in the search are first-fit (* * blocks of memory found in the heap that satisfy the request) and best-fit (using the most appropriate block of memory in the heap to satisfy the request). When the user has finished using the memory, the memory is returned to the heap.

The fundamental problem with this heap-based allocation strategy is fragmentation. When memory blocks are allocated, they are returned in different order and at different times. This leaves holes in the heap and takes some time to manage free memory effectively. This algorithm usually has higher memory efficiency (allocating needed memory), but it takes more time to manage the heap.

Another method, called buddy memory allocation, is a faster memory allocation technique that divides memory into power 2 partitions and uses the best-fit method to allocate memory requests. When the user frees memory, the buddy block is checked to see if its adjacent memory block has also been freed. If so, memory blocks are merged to minimize memory fragmentation. This algorithm is more time-efficient, but it results in memory waste due to the use of the best-fit method.

Slab caching

The slab allocator used by Linux is based on an algorithm introduced by Jeff Bonwick for the SunOS operating system. The allocator of Jeff revolves around the object cache. In the kernel, a large amount of memory is allocated for a limited set of objects, such as file descriptors and other common structures. Jeff found that initializing normal objects in the kernel takes longer than it takes to allocate and release them. Therefore, he concluded that memory should not be released back to a global memory pool, but should be kept in a state initialized for a specific purpose. For example, if memory is allocated to a mutex, you only need to execute the mutex initialization function (mutex_init) once when allocating memory to the mutex. Subsequent memory allocations do not need to execute this initialization function because it has been in the desired state since the last release and call destruction.

The Linux slab allocator uses this and other ideas to build a memory allocator that is efficient in both space and time.

Figure 1 shows the high-level organizational structure of the slab structure. At the * * layer is cache_chain, which is a list of links cached by slab. This is very useful for the best-fit algorithm and can be used to find the cache (traversing the list) that best suits the allocation size you need. Each element of a cache_chain is a reference to a kmem_cache structure (called a cache). It defines a pool of objects of a given size to manage.

Figure 1. Main structure of slab allocator

Each cache contains a slabs list, which is a contiguous block of memory (usually a page). There are three types of slab:

Slabs_full

Fully allocated slab

Slabs_partial

Partially allocated slab

Slabs_empty

Empty slab, or no object is assigned

Note: slab in the slabs_empty list is the main candidate for reaping. It is through this process that the memory used by slab is returned to the operating system for use by other users.

Each slab in the slab list is a contiguous block of memory (one or more contiguous pages) that is divided into objects. These objects are the basic elements that are allocated and released from a particular cache. Note that slab is the minimum allocation unit that the slab allocator operates on, so if you need to extend the slab, this is the minimum value to extend. Generally speaking, each slab is assigned as multiple objects.

Because objects are allocated and released from slab, a single slab can be moved between slab lists. For example, when all the objects in a slab are used up, move from the slabs_partial list to the slabs_full list. When a slab is fully allocated and an object is released, it is moved from the slabs_full list to the slabs_partial list. When all objects are released, move from the slabs_partial list to the slabs_empty list.

1. Slab related data structure

1) slab cache descriptor

Struct kmem_cache {struct array_cache * array [NR _ CPUS]; / / to improve efficiency, each cpu has a slab free object cache / * 2) Cache tunables. The number of Protected by cache_chain_mutex * / unsigned int batchcount;// moving objects in or out of the local cache in bulk unsigned int limit;// * the number of unsigned int shared; unsigned int buffer_size; struct kmem_list3 * nodelists [Max _ NUMNODES] of idle objects in the local cache; / / slab cache idle, partially idle, three linked lists of no idle slab unsigned int flags / * constant flags * / unsigned int num;// number of successive page frames per slab obj unsigned int gfporder;// number of consecutive page frames per slab when gfp_t gfpflags;// allocates page frames, the flag passed to the partner system, the number of colors used by size_t colour;//slab / / slab color offset struct kmem_cache * slabp_cache / / points to the chache where the slab descriptor is stored, and the internal slab. This field is the construction function void (* ctor) (void *, struct kmem_cache *, unsigned long) created by the null unsigned int slab_size;// single slab size void (* ctor) (void *, struct kmem_cache *, unsigned long); / / the object's destructor void (* dtor) (void *, struct kmem_cache *, unsigned long) The name of the const char * name;//slab cache struct list_head next;// links the cachep to the cachep linked list through this field

2) slab descriptor

Struct slab {struct list_head list; / / link the slab to each slab linked list. The offset void of * objects in slabs_full, slabs_paril, slabs_free unsigned long colouroff;//slab; / / the addresses of * objects in / / slab / / how many objects are being used kmem_bufctl_t free / / indicate which idle object to use next, unsigned short nodeid;//, which memory node the slab belongs to}

Slab descriptors may be stored in two places:

External slab descriptor: stored outside the slab, in a normal cache pointed to by the cache_size.

Internal slab descriptor: stored inside the slab at the beginning of the * page boxes of the memory allocated to the slab.

3) slab queue descriptor

Struct kmem_list3 {struct list_head slabs_partial; / / objects are used part of the slab descriptor linked list struct list_head slabs_full;// objects are occupied slab descriptor linked list struct list_head slabs_free;// contains only the slab descriptor of free objects the number of free objects in the unsigned long free_objects;// cache unsigned int free_limit Unsigned int colour_next; / * Per-node cache coloring * / spinlock_t list_lock; struct array_cache * shared; / / points to a local cache shared by all cpu struct array_cache * * alien; / * on other nodes * / unsigned long next_reap; / / used by slab's page collection algorithm using int free_touched; / / used by slab's page collection algorithm}

4) slab object descriptor

Typedef unsigned int kmem_bufctl_t

Each object has an object descriptor of type kmem_bufctl_t, and each slab descriptor has an array of type kmem_bufctl_t to describe each object in the slab. In fact, the descriptor records the next available free object, using the index of the array to form a linked list of objects. Object descriptors are also divided into internal object descriptors and external object descriptors:

Internal object descriptor: stored inside the slab, in front of the object described by the descriptor.

External object descriptor: stored outside the slab, in a normal cache pointed to by the cache descriptor slabp _ cache field

The organization chart between the slab descriptor and the slab object is shown in the following figure:

2. Local object cache of slab

In order to better support multiprocessors, reduce spin lock competition and make better use of hardware cache, each cache of the slab allocator contains a per-cpu data structure called slab local cache, which consists of an array of pointers to the released object. In this way, the release and allocation of slab objects only affects the local pointer array, reducing concurrency. The slab structure is involved only when the local array overflows or underflows. The relevant data structures are as follows:

Struct array_cache {the number of objects available in the unsigned int avail;// local cache, which is also the index of the free array location unsigned int limit;// the size of the local cache unsigned int batchcount;// the number of objects used when populating or emptying the local cache unsigned int touched;// if the local cache has been used recently, set it to 1 spinlock_t lock Void * entry [0];}

At the same time, in the multi-cpu environment, there is a shared cache whose address is stored in the lists.shared field of the cache descriptor, and the local shared cache is shared by all cpu, which makes it easy to move objects from one local cache to another.

3. Slab memory coloring

For example, cache line 32 bytes, bytes 0-31 write / read from memory, bytes 32-63 write / read from memory at once. ..

In addition, the cache corresponding to the memory location is not arbitrary.

Cache address 0 corresponds to memory address 0,32,64... .

Cache address 1 corresponds to memory address 1,33,65... .

...

A slab size must be an integer page, so the last 12 bits of the starting address are zero, that is, they all correspond to cache0. Then each obj of the two slab is the same size, so each obj of the two slab corresponds to the same cache line. In this way, the two obj with the same location should be accessed frequently, which makes it easier to refresh the cache back and forth and reduce the efficiency.

Coloring is to leave a cache line empty at the beginning of the second slab, so that the cache corresponding to each obj of the two slab is staggered, so that even if the two obj with the same location are accessed frequently, they will not use the same cache line.

But this method is also limited, the access of obj objects in two slab is compared randomly, and there is no guarantee which two are in one cache line.

IV. Application for slab memory

When kmem_cachep_alloc () is assigned to an object through slab in kernel code, essentially, the function called by * is _ cache_alloc (), and the corresponding source code is parsed as follows:

Static inline void * _ cache_alloc (struct kmem_cache * cachep, gfp_t flags) {void * objp; struct array_cache * ac; check_irq_off (); / / obtain the slab's local cache ac = cpu_cache_get (cachep) through the id of the slab where the process resides / / whether there are free slab objects in the local cache if (likely (ac- > avail)) {/ / if there are free objects, fetch a free object from the local cache array to use STATS_INC_ALLOCHIT (cachep); / / mark that the local cache has recently been used ac- > touched = 1 / / first fetch an unused object from the * side of the array, HOHO objp = ac- > entry [--ac- > avail];} else {STATS_INC_ALLOCMISS (cachep); / / there are no free objects in the local cache, so you need to populate the local cache objp = cache_alloc_refill (cachep, flags);} return objp } cache_alloc_refill () is used to populate the local cache and is also the essence of slab allocation. The code is parsed as follows: static void * cache_alloc_refill (struct kmem_cache * cachep, gfp_t flags) {int batchcount; struct kmem_list3 * 13; struct array_cache * ac; check_irq_off () / get the data structure of slab local cache ac = cpu_cache_get (cachep) according to cpu id; retry: / / batchcount records how many free objects batchcount = ac- > batchcount; if (! ac- > touched & & batchcount > BATCHREFILL_LIMIT) {batchcount = BATCHREFILL_LIMIT } / / get the slab linked list kmem_list3 on the corresponding memory node. Each memory node has its own set of free, partially idle and non-idle slab linked lists / / because the corresponding cpu accesses the memory node to which it belongs the fastest L3 = cachep- > Nodelists [Numa _ node_id ()]; BUG_ON (ac- > avail > 0 | |! L3) Spin_lock (& L3-> list_lock); / / populate the local cache with free objects from the local shared cache. Note that for the numa / / system, the free objects populated into the local cache are also the free objects if (L3-> shared & & transfer_objects (ac, L3-> shared, batchcount)) goto alloc_done on the memory node. While (batchcount > 0) {struct list_head * entry; struct slab * slabp; / / first allocate free objects from the partially idle slab and keep the completely free slab, because the free / / slab can be recycled when memory is insufficient entry = L3-> slabs_partial.next / / if there is no partially idle slab, you can only assign if (entry = & 13-> slabs_partial) {l3-> free_touched = 1; entry = l3-> slabs_free.next to the completely idle slab / / if the completely idle slab is gone, a new slab must be allocated to the slab cache. If (entry = & 13-> slabs_free) goto must_grow;} slabp = list_entry (entry, struct slab, list); check_slabp (cachep, slabp); check_spinlock_acquired (cachep) / / allocate free objects from slab until free objects no longer exist in slab, or enough free objects have been allocated to while (slabp- > inuse)

< cachep->

Num & & batchcount--) {STATS_INC_ALLOCED (cachep); STATS_INC_ACTIVE (cachep); STATS_SET_HIGH (cachep); / / get the free object ac- > entry [ac-> avail++] = slab_get_obj (cachep, slabp, numa_node_id ()) } check_slabp (cachep, slabp); / * move slabp to correct slabp list: * / list_del (& slabp- > list); / / if the free memory of the corresponding slab is allocated, hang it in the slab linked list of slabs_full (slabp- > free = = BUFCTL_END) list_add (& slabp- > list, & 13-> slabs_full) Else list_add (& slabp- > list, & L3-> slabs_partial);} must_grow: L3-> free_objects-= ac- > avail; alloc_done: spin_unlock (& L3-> list_lock); / / No object if (unlikely (! ac- > avail)) {int x / / apply for a new slab x = cache_grow (cachep, flags, numa_node_id ()) for cache; / * cache_grow can reenable interrupts, then ac could change. * / ac = cpu_cache_get (cachep); if (! X & & ac- > avail = = 0) / * no objects in sight? Abort * / return NULL; / / repopulate the local cache if (! ac- > avail) / * objects refilled by interrupt? * / goto retry;} ac- > touched = 1; / / return the local cache * the address of an idle object return ac- > entry [--ac- > avail];}

5. Release of slab memory

The release of slab memory is in the function kmem_cache_free (), and the main processing part is in the _ _ cache_free () function. The code is parsed as follows:

Static inline void _ cache_free (struct kmem_cache * cachep, void * objp) {struct array_cache * ac = cpu_cache_get (cachep); check_irq_off (); objp = cache_free_debugcheck (cachep, objp, _ _ builtin_return_address (0)); if (cache_free_alien (cachep, objp)) return / / the limit of free objects available in the local cache has not been reached. Put the free objects in the local cache if (likely (ac- > avail)

< ac->

Limit) {STATS_INC_FREEHIT (cachep); ac- > entry [ac-> avail++] = objp; return;} else {/ / cache_flusharray () will put some free objects in the local cache into slab cache_flusharray (cachep, ac); ac- > entry [ac-> avail++] = objp }} static void cache_flusharray (struct kmem_cache * cachep, struct array_cache * ac) {int batchcount; struct kmem_list3 * L3; int node = numa_node_id (); / / batchcount free objects should be returned to slab at a time batchcount = ac- > batchcount; check_irq_off () / / get the slab list3 of the corresponding memory node, which records the node's slab linked list L3 = cachep- > nodelists [node]; spin_lock (& L3-> list_lock) / / first return to the local shared cache. Note that the / / idle objects in the local shared cache are only used by each cpu allocation on the memory node, which makes memory access more efficient. If (L3-> shared) {struct array_cache * shared_array = L3-> shared; int max = shared_array- > limit-shared_array- > avail; if (max) {if (batchcount > max) batchcount = max / / copy batchcount array elements to the local cache memcpy (& (shared_array- > entry [shared _ array- > avail]), ac- > entry, sizeof (void *) * batchcount); shared_array- > avail + = batchcount; goto free_done }} / / release free_block (cachep, ac- > entry, batchcount, node) in slab without local cache; free_done: spin_unlock (& 13-> list_lock); ac- > avail-= batchcount / / move the remaining free objects forward after deletion, hoho, and there may be some free objects memmove (ac- > entry, & (ac- > entry [batchcount]), sizeof (void *) * ac- > avail);} static void free_block (struct kmem_cache * cachep, void * * objpp, int nr_objects, int node) {int i; struct kmem_list3 * L3 For (I = 0; I)

< nr_objects; i++) { void *objp = objpp[i]; struct slab *slabp; //先从对象获取到其所在的page，再从page得到其所属的slab //page->

Lru.prev records slab slabp = virt_to_slab (objp) to which page belongs; L3 = cachep- > nodelists [node]; list_del (& slabp- > list); check_spinlock_acquired_node (cachep, node); check_slabp (cachep, slabp); / / put in the corresponding slab slab_put_obj (cachep, slabp, objp, node) STATS_DEC_ACTIVE (cachep); L3-> free_objects++; check_slabp (cachep, slabp) / * fixup slab chains * / / slab all objects have been returned to if (slabp- > inuse = = 0) {/ / the number of free objects in the slab cache exceeds the limit, and the slab can be released To / / release the memory if (L3-> free_objects > L3-> free_limit) {L3-> free_objects-= cachep- > num Slab_destroy (cachep, slabp);} else {/ / add list_add to the fully idle slab linked list (& slabp- > list, & 13-> slabs_free) }} else {/ / add list_add_tail to the partially idle slab linked list (& slabp- > list, & L3-> slabs_partial) } here, the article "how to implement slab caching in Linux memory management" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.