How to understand Netty memory management 07/08 Update SLTechnology News&Howtos

How to understand Netty memory management

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article focuses on "how to understand Netty memory management", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand Netty memory management.

Preface

It is the ease of use and high performance of Netty that makes Netty so popular.

As a communication framework, the first thing to bear the brunt is the high performance requirements of IO.

Many readers know that the underlying layer of Netty reduces the memory copy between kernel state and user state and speeds up the IO rate by using Direct Memory. However, frequently applying for Direct Memory from the system and releasing it after use is itself a performance impact. To this end, Netty has implemented a set of its own memory management mechanism. When applying, Netty will apply to the operating system for a larger piece of memory at one time, and then manage the large memory and split it into small blocks as needed. When releasing, Netty is in no hurry to release memory directly, but reclaims it for next use.

This set of memory management mechanism can manage not only Directory Memory, but also Heap Memory.

The end consumer of memory-- ByteBuf

Here, I would like to emphasize to readers that ByteBuf and memory are actually two concepts that need to be understood differently.

ByteBuf is an object and needs to be allocated a piece of memory for it to work properly.

The memory can be popularly understood as the memory of our operating system, although the applied memory also needs to be stored by the carrier: when heap memory is stored through byte [], while Direct memory is the ByteBuffer of Nio (so Java's ability to use Direct Memory is provided by the Nio package in JDK).

The reason for emphasizing these two concepts is that Netty's memory pool (or memory management mechanism) is concerned with memory allocation and recycling, while Netty's ByteBuf recycling is another technology called object pooling (implemented through Recycler).

Although the two are always used together, they are two sets of independent mechanisms. There may be a time when you create a ByteBuf, the ByteBuf is recycled, but the memory is newly requested from the operating system. It is also possible that when you create a ByteBuf, the ByteBuf is newly created and the memory is recycled.

Because for a creation process, it can be divided into three steps:

Get ByteBuf instance (may be new or cached between)

Apply for memory from the Netty memory management mechanism (either newly applied to the operating system or previously recycled)

Allocate the requested memory for ByteBuf use

This article only focuses on the memory management mechanism, so it will not explain the object collection mechanism too much.

Related classes of memory management in Netty

There are many classes related to memory management in Netty. PoolArena,PoolChunkList,PoolChunk,PoolSubpage and so on are provided inside the framework to manage a block or group of memory.

Externally, ByteBufAllocator is provided for users to operate.

Next, we will first introduce these classes to a certain extent, in order to understand the process of memory allocation and recycling through ByteBufAllocator.

For the sake of space and readability, this article will not involve a large number of detailed code instructions, but mainly through the introduction of the necessary code.

For comments on the code, see the netty project on my GitHub.

Minimum memory requested by PoolChunck--Netty from OS

As mentioned above, in order to reduce the frequent requests for memory from the operating system, Netty will request a larger piece of memory at one time. The memory is then managed, allocating a portion of it to memory consumers (that is, ByteBuf) each time as needed. The memory here is PoolChunk, and its size is determined by ChunkSize (default is 16m, that is, 16m of memory is requested from OS at a time).

Minimum memory unit managed by Page--PoolChunck

The minimum memory that PoolChunk can manage is called Page, and the size is PageSize (default is 8K), that is, all memory requested from PoolChunk is in units of Page (one or more Page).

When PoolChunk allocates memory, PoolChunk looks at the internally recorded information to find out the location of the Page that satisfies this memory allocation and allocates it to the user.

How PoolChunck manages Page

We already know that PoolChunk organizes memory in Page units, and allocates memory in Page units as well.

So how can PoolChunk manage to strike a balance between allocation efficiency (finding allocable memory as quickly as possible and ensuring that the allocated memory is continuous) and efficiency (avoiding memory waste as little as possible and making the best use of things)?

Netty adopts the idea of Jemalloc.

First, PoolChunk organizes internal memory through a complete binary tree. Taking the default ChunkSize of 16m and PageSize of 8K as an example, a PoolChunk can be divided into 2048 Page. If you think of the 2048 Page as the width of the leaf nodes, you can get a tree with a depth of 11 (2 ^ 11 = 2048).

We let each leaf node manage a Page, then the memory managed by the parent node is two Page (the parent node has two leaf nodes), and so on, the root node of the tree manages all the Page of the PoolChunk (because all the leaf nodes are its children), and the amount of memory managed by a node in the tree is all the Page managed by the leaf nodes contained in the child tree of the node as the root.

The advantage of this is that when you need memory, you can quickly find out where to allocate memory (you just need to find the node from the top down to manage the memory you need, and then allocate the memory managed by that node), and the allocated memory is continuous (as long as you make sure that the Page corresponding to the adjacent leaf nodes is continuous).

The node numbered 512 in the image above manages four Page, Page0, Page1, Page2, Page3 (because there are four leaf nodes 2048, 2049, 2050, 2051).

The node numbered 1024 manages two Page, Page0 and Page1 (their corresponding leaf nodes are Page0 and Page1).

When you need to allocate 32K of memory, you only need to allocate the node number 512 (after 512 is allocated, it will default that all the child nodes under it cannot be allocated). When you need to allocate 16K of memory, you only need to allocate the node number 1024 (once node 1024 is allocated, the following 2048 and 2049 are no longer allowed to be allocated).

After learning about the memory management mechanism within PoolChunk, the reader may have several questions:

How does the PoolChunk mark that a node has been assigned?

When a node is allocated, how can the memory allocated by its parent node be updated? That is, once node 2048 is allocated, when you need 16K more memory, you cannot allocate it from node 1024, because now node 1024 has only 8K of memory available.

To solve these two problems, PoolChunk is an internally maintained byte [] memeoryMap and byte [] depthMap variables.

The length of the two arrays is the same, and the length is equal to the number of nodes in the tree + 1. Because they put the root node in the position of 1. The position relationship between the parent node and the child node in the array is:

Assuming that the subscript of parnet is I, the subscript of the child node is 2i and 2i+1

Using an array to represent a binary tree, do you think of the data structure of heap?

It is already known that both arrays represent a binary tree, and each element in the array can be regarded as a node of the binary tree. So let's see what the code means by the value of the element.

For depthMap, this value represents the number of layers of the tree in which the node is located. For example: depthMap [1] = = 1, because it is the root node, and depthMap [2] = depthMap [3] = 2, indicating that both nodes are at the second layer. Because the structure does not change once the tree is determined, the value of each element does not change after the depthMap is initialized.

For memoryMap, its value represents the minimum number of layers under that node that can be used for full memory allocation (or the number of layers closest to the root node).

This may be a bit awkward to understand, let's take the above example as an example.

First of all, when no memory is allocated, the amount of memory that each node can allocate is the initial state of the layer (that is, the initial state of memoryMap is consistent with that of depthMap). Once one of its child nodes is allocated, the complete memory that can be allocated by the parent node (complete memory refers to the continuous blocks of memory managed by the node, rather than the remaining memory size of the node) is reduced (memory allocation and recycling will modify the value of the related nodes in the associated mermoryMap).

For example, after node 2048 is allocated, for node 1024, the memory that can be fully allocated (originally 16K) is the same as that of node 2049 (its right child node) (reduced to 8K). In other words, the capability of node 1024 has been reduced to that of the layer node where the 2049 nodes are located.

This degradation may affect all parent nodes.

At this point, the complete memory that the 512 nodes can allocate is 16K instead of 24K (because the memory allocation is allocated according to the power of 2, although the real memory needed by a consumer may be 21K, Netty's memory management mechanism allocates 32K of memory directly).

But this is not to say that another 8K of memory managed by node 512 is wasted. 8K of memory can also be allocated when applying for 8K of memory.

Use pictures to demonstrate the process of PoolChunk memory allocation. Where value represents the value of the node in memoeryMap, and depth represents the value of the node in depthMap.

For the first memory allocation, the applicant actually needs 6K of memory:

The consequence of this allocation is that the memoryMap values of all its parent nodes are added one level down.

After that, applicants need to apply for 12K of memory:

Since node 1024 can no longer allocate the required memory, and node 512 can still be allocated, node 512 asks its right node to try again.

What is described above is the process of memory allocation, and the process of memory recovery is the reverse of the above process-- after recycling, the memoryMap value of the corresponding node is modified. There's not much introduction here.

Management of PoolChunk by PoolChunkList--

Inside PoolChunkList, there is a linked list of PoolChunk. Usually all PoolChunk usage (allocated memory / ChunkSize) in a PoolChunkList is within the same range.

Each PoolChunkList has its own minimum or maximum usage range, and a linked list will be formed between PoolChunkList and PoolChunkList, and PoolChunkList with a small usage range will be higher in the linked list.

With the memory allocation and usage of PoolChunk, when its utilization changes, the PoolChunk will be adjusted back and forth in the linked list of PoolChunkList and moved to the appropriate range of PoolChunkList.

The advantage of this is that the small utilization of PoolChunk can be used for memory allocation first, thus keeping the utilization of PoolChunk at a high level and avoiding memory waste.

The manager of PoolSubpage-- small memory

The minimum memory managed by PoolChunk is a Page (the default is 8K), and when the memory we need is relatively small, allocating a Page directly will undoubtedly result in memory waste.

PoolSubPage is used to manage this kind of small memory manager.

Small memory refers to less than one Page of memory, can be divided into Tiny and Smalll,Tiny is less than 512B of memory, while Small is 512 to 4096B of memory. If the memory block is greater than or equal to one Page, it is called Normal, and the memory block larger than one Chunk is called Huge.

The internal parts of Tiny and Small are subdivided according to the specific amount of memory.

For Tiny, it will be divided into 16, 32, 48. 496 (increasing in multiples of 16), a total of 31 cases.

For Small, it can be divided into four cases: 512, 1024, 2048, 4096.

PoolSubpage first requests a Page memory from PoolChunk, and then divides the page into equal memory blocks according to specifications (a PoolSubpage manages only one memory block, for example, only 16B is managed, a Page memory is divided into 512 16B memory blocks).

Only one specification of memory management is selected for each PoolSubpage, so dealing with the same specification of PoolSubpage is often organized by linked lists, while different specifications are stored in different places.

And always manage the feature of a specification, so that PoolSubpage does not need to use the full binary tree method of PoolChunk to manage memory (for example, managing 16B PoolSubpage only needs to consider allocating 16B memory, when applying for 32B memory, it must be handed over to management 32B memory for processing) Only long [] bitmap (which can be thought of as an array of bits) is used to record which of the managed memory blocks have been allocated (the number of bits represents the number of memory blocks).

The implementation is much simpler.

PoolArena-- memory management co-ordinator

PoolArena is the coordinator of memory management.

Inside it is a linked list of PoolChunkList (as described above, the linked list is divided by the usage managed by PoolChunkList).

In addition, it has two arrays of PoolSubpage, PoolSubpage [] tinySubpagePools and PoolSubpage [] smallSubpagePools.

By default, the length of tinySubpagePools is 31, that is, the PoolSubpage of 31 specifications (different specifications of PoolSubpage are stored in the corresponding array subscript, and PoolSubpage of the same specification form a linked list in the same array subscript).

By the same token, by default, the length of the smallSubpagePools is 4, and the PoolSubpage of 512, 1024, 2048, 4096 is stored.

PoolArena will decide whether to find PoolChunk or corresponding specification PoolSubpage to allocate according to the amount of memory applied.

It is worth noting that there is competition for PoolArena to allocate memory, so at critical points, PoolArena uses sychronize to keep threads safe.

Netty optimizes this contention to a certain extent by allocating multiple PoolArena to allow threads to use different PoolArena as much as possible to reduce contention.

PoolThreadCache-- thread local cache to reduce competition in memory allocation

PoolArena inevitably creates competition. In addition to creating multiple PoolArena to reduce competition, Netty also allows threads to cache applied memory when releasing memory without immediately returning it to PoolArena.

The cached memory is stored in PoolThreadCache, which is a thread-local variable, so it is thread-safe, and access to it does not need to be locked.

Inside PoolThreadCache is the cache pool (array) of MemeoryRegionCache, which can also be divided into Tiny,Small and Normal by hierarchy (Huge is not cached because Huge is not efficient).

The partition method under the two levels of Tiny and Small is the same as that of PoolSubpage, while Normal has a parameter to control which specifications are cached (for example, one Page, two Page and four Page, etc.). Memory blocks that are not within the Normal cache specification will not be cached and will be returned directly to PoolArena.

If you look at MemoryRegionCache, there is a queue inside, and all nodes in the same queue can be seen as blocks of memory of the same specification used by the thread. At the same time, it also has a size attribute to control that the queue is too long (when the queue is full, the memory block of that specification will not be cached, but will be returned directly to PoolArena).

When a thread needs memory, it will first find the corresponding level of cache pool (corresponding array) from its own PoolThreadCache. Then the MemoryRegionCache of the corresponding specification is found from the array. Finally, the memory block is taken out of the queue and allocated.

Overview of Netty memory institutions and PooledByteBufAllocator request memory steps

After understanding so many of the above concepts, give the reader a deeper impression through a picture.

The figure above only details the part for Heap Memory, and Directory Memory is similar.

Finally, with PooledByteBufAllocator as the entry, go through the memory request process all over again:

PooledByteBufAllocator.newHeapBuffer () starts to apply for memory

Get the thread-local variable PoolThreadCache and the PoolArena bound to the thread

To allocate memory through PoolArena, first get the ByteBuf object (which may be recycled or created by the object pool), and then start the memory allocation.

Judge the level of memory before allocating, try to find a cache block of the same specification from PoolThreadCache, or allocate memory from PoolArena if not.

For Normal level memory, find the appropriate PoolChunk from the linked list of PoolChunkList to allocate the memory. If not, apply for a PoolChunk like OS first, and then allocate the corresponding Page by PoolChunk.

For Tiny and Small level memory, find the memory allocation from the corresponding PoolSubpage cache pool. If there is no PoolSubpage, the line will go to step 5.

At this point, I believe you have a deeper understanding of "how to understand Netty memory management". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.