How does Netty allocate memory? 07/02 Update SLTechnology News&Howtos

How does Netty allocate memory?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "how Netty achieves memory allocation". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how Netty achieves memory allocation.

Classification of data containers in Netty

When it comes to data preservation, we must talk about memory allocation, which can be divided into heap memory and out-of-heap memory according to storage space, and pooled memory and non-pooled memory according to memory area coherence. The implementation interfaces of these partitions in Netty are as follows.

According to the underlying storage space:

Heap buffer: HeapBuffer

Direct buffer: DirectBuffer

According to whether it is pooled:

Pooling: PooledBuffer

Non-pooling: UnPooledBuffer

PoolDireBuf-type memory is used by default, which is mainly managed by PoolArea. In addition, Netty does not directly expose these API, but provides operations related to the allocation of Unsafe classes as export exposure data.

What is pooling?

Generally, the purpose of applying for memory is to check where the current memory has a free memory block suitable for the current data block size, and if so, save the data in the current memory block.

So what Chi Hua wants to do is: since I have to find a memory address to store the data every time I come, I will first apply for a memory address, which is my dedicated space, and I am in charge of memory allocation and recycling.

The problem solved by pooling: memory fragmentation.

Internal fragmentation: that is, the requested address space is larger than the memory space used by the real data.

For example, if the space applied for 1m is fixed as the memory used by a thread, but the thread takes up only 0.5m at most each time, then there is a fragment of 0.5m each time. If the space is not effectively reclaimed for a long time, there must be a memory hole.

External fragmentation: when multiple memory spaces are merged, it is found that there is not enough space to be allocated for use.

For example, there is a 20byteByte 13byte contiguous memory space that can be reclaimed, and now there is a block of 48byte data to be stored, and these two add up to only 33byte space, which must not be used.

How to implement a memory pool?

① linked list maintains free memory address

The simplest thing is to get a linked list to maintain the address of the current free memory space. Delete it from the linked list if it is in use, and add it to the corresponding position of the linked list if it is released.

This method is simple to implement, but it is still difficult to search and free memory maintenance, so it is not suitable.

② fixed-length memory space allocation

Maintain two lists, one is the unallocated memory list and the other is the allocated memory list. Each memory block is the same size, and if not enough, multiple blocks will be merged together.

The disadvantage of this approach is that it wastes a certain amount of memory space, and there is still no problem if there is a particular scenario.

③ multi-segment fixed-length pool allocation

On the basis of the above fixed length allocation, the space is allocated according to different object sizes from a fixed length to a different object size. 64K), allocating multiple fixed-size memory pools.

Every time you want to apply for memory, go to the corresponding pool according to the current object size to see if there is any remaining space.

Linux itself supports dynamic memory allocation and release, and the corresponding command is: malloc/free. The full name of malloc is memory allocation, which is called dynamic memory allocation in Chinese. It is used to request a continuous block of memory of specified size to return the address of the allocated memory area with the void* type.

The implementation process of malloc/free:

Free storage space is organized as a free linked list (address increment), with each block containing a length, a pointer to the next block, and a pointer to its own storage space. (because some places in the program may not be applied through malloc calls, the space managed by malloc is not necessarily continuous)

When there is an application request, malloc scans the free list until a sufficiently large block is found. (adapt for the first time) (so it doesn't take exactly the same time to call malloc)

If the block exactly matches the requested size, it is removed from the linked list and returned to the user. If the block is too large, it is divided into two parts, the tail part is distributed to the user, and the rest is left in the free linked list (change the header information). So malloc allocates a contiguous piece of memory.

When releasing, first search the free linked list to find the appropriate location where the released block can be inserted. If either side adjacent to the released block is a free block, the two blocks are combined into a larger block to reduce memory fragmentation.

Memory allocation in Netty

Netty adopts the idea of jemalloc, which is a concurrent malloc algorithm implemented by FreeBSD.

Jemalloc relies on multiple Arena (allocators) to allocate memory, running applications have a fixed number of multiple Arena, the default number is related to the number of processors.

The reason why there are multiple Arena in the system is that the competition of each thread in memory allocation is inevitable, which may greatly affect the efficiency of memory allocation. In order to alleviate the thread competition in the case of high concurrency, Netty allows users to create multiple allocators (Arena) to separate locks and improve the efficiency of memory allocation.

When a thread allocates / reclaims memory for the first time, it is first assigned a fixed Arena. The thread uses round-robin when selecting Arena, that is, it is selected sequentially and in turn.

Each thread stores a variety of Arena and cache pool information, which reduces competition and improves access efficiency. Arena divides the memory into many Chunk for management, while Chunk stores Page internally and applies for each page.

When applying for memory allocation, the allocated specifications are divided into the following four categories, corresponding to different ranges, and the processing process is also different:

Tiny: represents a block of memory with a size of 0-512B.

Small: represents a block of memory the size of which is in 512B-8K.

Normal: represents a block of memory the size of which is in 8K-16M.

Huge: represents a block of memory larger than 16m.

Finer-grained units are defined in each block to allocate data:

Chunk: the size of a Chunk is 16m memory chunk, which is used by Netty to request memory from the operating system. All subsequent memory allocations are performed in Chunk.

Memory is allocated internally in Page:Chunk in units of Page, with a Page size of 8K. When we need 16K of space, Netty will find two Page from a Chunk to allocate.

Subpage and element:element are smaller units than Page, and when we apply for memory less than 8K, Netty allocates memory in element units. Element does not have a fixed size, which is determined by the needs of users.

Netty manages element,Subpage through Subpage, which is transformed from Page. When we need 1K of space, Netty will turn a Page into a Subpage, and then divide the Subpage into eight 1K element to allocate.

Memory allocation in Chunk

Threads allocate memory mainly from two places: PoolThreadCache and Arena. The PoolThreadCache thread is exclusive, and the Arena is shared by several threads.

When you apply for memory for the first time, Netty will allocate a portion of the memory (Chunk) to the user, and this part of the work is done by Arena.

When the user releases the memory after using it, the split memory will be cached in PoolThreadCache according to different sizes. The next time you need to apply for memory, you will first look for it in PoolThreadCache.

Chunk, Page, Subpage, and element are all concepts in Arena, and Arena's job is to carve out blocks of memory of the right size from a whole block of memory.

The largest unit of memory in Arena is Chunk, which is the unit in which Netty requests memory from the operating system.

After a piece of Chunk (16m) is applied, the internal Page will be divided into 2048 Page (8K). When a user requests more than 8K of memory from Netty, Netty will allocate memory in the form of Page.

Chunk manages Page internally through partner algorithm, which is implemented as a fully balanced binary tree:

The memory managed by all the child nodes in the binary tree also belongs to its parent node. When we apply for 16K of memory, we keep looking for available nodes from the root node all the way to layer 10.

So how do you determine whether a node is available? Netty stores a value within each node that represents the level below which there are unassigned nodes.

For example, if the value of a layer 9 node is 9, it means that the node itself is unassigned to all the child nodes below.

If the layer 9 node has a value of 10, it means that it itself cannot be assigned, but layer 10 has child nodes that can be assigned.

If the value of the layer 9 node is 12, the depth of the assignable node is greater than the total depth, which means that the node and all the child nodes below it cannot be assigned.

The following figure describes the process of allocation:

The allocation of small memory (less than 4096) also refines the Page into smaller units of Subpage.

There are two main categories of Subpage by size:

Tiny: if it is less than 512, the minimum space is 16, the alignment size is 16, and the interval is [16512), so there are 32 cases.

Small: there are four kinds of cases greater than or equal to 512: 512, 1024, 2048, 4096.

In PoolSubpage, bitmaps are directly used to manage free space (because there is no application for k contiguous spaces), so applying for release is very simple.

When you apply for a small memory space for the first time, you need to apply for a free page, then convert the page to PoolSubpage, set the page to be occupied, and finally save the PoolSubpage in the PoolSubpage pool.

In this way, you don't need to apply for a free page next time, just go to the pool and find it. There are 36 kinds of PoolSubpage in Netty, so the PoolSubpage pool is represented by 36 PoolSubpage linked lists.

Because a single PoolChunk is only 16m, which is far from enough, there will be a lot of PoolChunk, these PoolChunk form a linked list, and then hold the linked list with PoolChunkList.

Let's first analyze how the memory in Netty is allocated from the memory allocator PoolArena. The job of Area is to coordinate how to allocate the right size of memory to the current data from a whole block of memory.

PoolArena is the memory pool implementation abstract class of Netty, and its internal subclasses are HeapArena and DirectArena.

HeapArena corresponds to heap memory (heap buffer) and DirectArena corresponds to out-of-heap direct memory (direct buffer), both of which are identical except for operating memory (byte [] and ByteBuffer).

Structurally, PoolArena mainly consists of three molecular memory pools:

TinySubpagePools

SmallSubpagePools

A series of PoolChunkList

Both tinySubpagePools and smallSubpagePools are arrays of PoolSubpage with lengths of 32 and 4, respectively.

PoolChunkList is a container in which a series of PoolChunk objects can be stored, and Netty divides PoolChunkList into different levels of containers according to memory usage.

Abstract class PoolArena implements PoolArenaMetric {enum SizeClass {Tiny, Small, Normal} / / this parameter specifies the length of the tinySubpagePools array. Since the memory block difference of each element of the tinySubpagePools is 16, / / the array length is 512 allocator final PooledByteBufAllocator parent 16, where 512 > > 4 static final int numTinySubpagePools = 512 > > 4; / / represents the allocator final PooledByteBufAllocator parent of the PoolArena. / / represents the maximum height of the binary tree composed of Page nodes in PoolChunk. The default is 11 private final int maxOrder; / / page, and the default is 8K final int pageSize. / / specifies how much the leaf node size 8KB is to the power of 2, and the default is 13. The main function of this field is that when calculating the / / layers of the target memory belonging to the binary tree, you can quickly calculate the number of layers final int pageShifts; / / default 16MB final int chunkSize with the help of the difference between the memory size and the pageShifts. / / because the size of PoolSubpage is 8KB=8196, the value of this field is / /-8192 = > = > 1111 1111 1111 1110 0000 0000 0000 / / so when judging whether the target memory is less than 8KB, you only need to manipulate the target memory with the number. As long as the operation result is equal to 0, / it means that the target memory is less than 8KB. In this way, we can judge whether the memory request final int subpageOverflowMask should be made in tinySubpagePools or smallSubpagePools / / first. / / this parameter specifies the length of the smallSubpagePools array. By default, 4 final int numSmallSubpagePools; / / tinySubpagePools is used to allocate Page private final PoolSubpage [] tinySubpagePools; / / smallSubpagePools less than 512 byte, to allocate Page private final PoolSubpage [] smallSubpagePools; / / which is greater than or equal to 512 byte and less than pageSize memory, to store chunk private final PoolChunkList q050 allocated to PoolChunk / / storage memory utilization greater than or equal to pageSize memory. / / chunk private final PoolChunkList q025 with 25-75% storage memory utilization; / / chunk private final PoolChunkList q000 with 1-50% storage memory utilization; / / chunk private final PoolChunkList qInit; with 0-25% storage memory utilization / / chunk private final PoolChunkList q075 with 75-100% storage memory utilization; / / chunk private final PoolChunkList q100 with 100% storage memory utilization / / heap memory (heap buffer) static final class HeapArena extends PoolArena {} / / out-of-heap direct memory (direct buffer) static final class DirectArena extends PoolArena {}}

As shown above, a PoolArena is a large area of memory consisting of multiple PoolChunk, while each PoolChunk is made up of multiple Page.

When the memory that needs to be allocated is less than Page, in order to save memory, PoolSubpage is used to allocate memory that is smaller than Page.

In order to maximize the utilization of PoolChunk space in PoolArena, each PoolChunk in PoolArena is divided into six categories according to the amount of space used:

QInit: chunk with storage memory utilization 0-25%

Q000: chunk with storage memory utilization of 1-50%

Q025: chunk with storage memory utilization of 25-75%

Q050: chunk with 50-100% storage memory utilization

Q075: chunk with storage memory utilization of 75-100%

Q100: chunk with 100% storage memory utilization

PoolArena maintains a bi-directional linked list of PoolChunkList, and each PoolChunkList maintains a PoolChunk bi-directional linked list.

When allocating memory, PoolArena finds a suitable PoolChunk in PoolChunkList and allocates a piece of memory from PoolChunk.

Here's how PoolArena allocates memory:

Private void allocate (PoolThreadCache cache, PooledByteBuf buf, final int reqCapacity) {/ / the requested capacity format is 2 ^ N final int normCapacity = normalizeCapacity (reqCapacity); / / determine whether the target capacity is less than 8KB, and use tiny or small to apply for memory if (isTinyOrSmall (normCapacity)) {/ / capacity

< pageSize int tableIdx; PoolSubpage[] table; boolean tiny = isTiny(normCapacity); // 判断目标容量是否小于512字节，小于512字节的为tiny类型的 if (tiny) { // < 512 // 将分配区域转移到 tinySubpagePools 中 if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) { // was able to allocate out of the cache so move on return; } // 如果无法从当前线程缓存中申请到内存，则尝试从tinySubpagePools中申请，这里tinyIdx()方法 // 就是计算目标内存是在tinySubpagePools数组中的第几号元素中的 tableIdx = tinyIdx(normCapacity); table = tinySubpagePools; } else { // 如果目标内存在512byte~8KB之间，则尝试从smallSubpagePools中申请内存。这里首先从 // 当前线程的缓存中申请small级别的内存，如果申请到了，则直接返回 if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) { // was able to allocate out of the cache so move on return; } tableIdx = smallIdx(normCapacity); table = smallSubpagePools; } // 获取目标元素的头结点 final PoolSubpage head = table[tableIdx]; // 这里需要注意的是，由于对head进行了加锁，而在同步代码块中判断了s != head， // 也就是说PoolSubpage链表中是存在未使用的PoolSubpage的，因为如果该节点已经用完了， // 其是会被移除当前链表的。也就是说只要s != head，那么这里的allocate()方法 // 就一定能够申请到所需要的内存块 synchronized (head) { // s != head就证明当前PoolSubpage链表中存在可用的PoolSubpage，并且一定能够申请到内存， // 因为已经耗尽的PoolSubpage是会从链表中移除的 final PoolSubpage s = head.next; // 如果此时 subpage 已经被分配过内存了执行下文，如果只是初始化过，则跳过该分支 if (s != head) { // 从PoolSubpage中申请内存 assert s.doNotDestroy && s.elemSize == normCapacity; // 通过申请的内存对ByteBuf进行初始化 long handle = s.allocate(); assert handle >

= 0; / / initializing PoolByteBuf means that its location is assigned to the region, but memory s.chunk.initBufWithSubpage (buf, handle, reqCapacity) has not been allocated at this time; / / A pair of tiny type requests are updated if (tiny) {allocationsTiny.increment ();} else {allocationsSmall.increment ();} return }} / / this indicates that the target memory block cannot be applied for in the target PoolSubpage linked list, so try to apply for allocateNormal (buf, reqCapacity, normCapacity); return from PoolChunk } / / if the target memory is larger than 8KB, then determine whether the target memory is greater than 16m. If it is larger than 16m, / / it is not managed by memory pool. If it is less than 16m, then go to PoolChunkList to request if (normCapacity 0; c.initBuf (buf, handle, reqCapacity); qInit.add (c);}

First of all, the application actions are handed over to each PoolChunkList for processing in the order of q050 → Q025 → Q000 → qInit → q075. If the memory is applied for in the corresponding PoolChunkList, it will be returned directly.

If the request fails, create a new PoolChunk directly, then request the target memory in the PoolChunk, and finally add the PoolChunk to the qInit.

As mentioned above, Chunk is the largest unit for Netty to apply for memory blocks from the operating system, and each Chunk is 16m.

PoolChunk maintains a fully balanced binary tree as a marker bit to manage the underlying memory distribution and recycling through the memoryMap array, and the memory managed by all child nodes also belongs to its parent node.

On how to maintain a completely balanced binary tree within PoolChunk is not here, you can look at the source code if you are interested.

For memory release, PoolArena is mainly divided into two cases, pooled and unpooled. If it is unpooled, the target memory block will be directly destroyed, and if pooled, it will be added to the cache of the current thread.

The source code for the free () method is as follows:

Public void free (PoolChunk chunk, ByteBuffer nioBuffer, long handle, int normCapacity, PoolThreadCache cache) {/ / if it is unpooled, destroy the target memory block directly and update the related data if (chunk.unpooled) {int size = chunk.chunkSize (); destroyChunk (chunk); activeBytesHuge.add (- size); deallocationsHuge.increment () } else {/ / if it is pooled, first determine which type it is, that is, tiny,small or normal, / / then leave it to the cache of the current thread for processing. If it is added successfully, it will directly return SizeClass sizeClass = sizeClass (normCapacity); if (cache! = null & cache.add (this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {return } / / if the current thread's cache is full, return the target memory block to the public memory block for processing freeChunk (chunk, handle, sizeClass, nioBuffer).}} so far, I believe you have a deeper understanding of "how Netty implements memory allocation". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.