How does the kafka sending client ensure infrequent GC in high concurrency scenarios 07/03 Update SLTechnology News&Howtos

How does the kafka sending client ensure infrequent GC in high concurrency scenarios

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Many novices are not very clear about how the kafka sending client ensures infrequent GC in high concurrency scenarios. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

Recently looking at the kafka source code, it is really elegant by its client buffer pool technology.

Note: the source code used is from the kafka2.2.2 version.

Background

When our application calls the kafka client producer to send messages, within the kafka client, messages belonging to the same topic partition are aggregated to form a batch. The messages that are actually sent to the kafka server are all in batch. As shown in the following figure:

The benefits of this are obvious. The client and the server communicate through the network, so batch transmission can reduce the performance overhead brought by the network and improve the throughput.

The management of this Batch is worth discussing. Some people may say, isn't it easy? Allocate a block of memory when you use it, and release it after sending it.

Kafka is written in Java (most of the new versions are implemented in java). The above solution is to new a space and assign a value to a reference when you use it, and set the reference to JVM GC processing such as null when you release it.

There seems to be nothing wrong with it. However, when the concurrency is relatively high, GC will be carried out frequently. We all know that there is a stop the world in GC, although the latest GC technology is very short, it can still become a performance bottleneck in the production environment.

The designers of kafka can certainly take this layer into account. Let's learn how kafka manages batch.

Analysis of the principle of buffer pool technology

The kafka client uses the concept of a buffer pool, pre-allocates real blocks of memory and places them in the pool.

Each batch actually corresponds to a memory space in the buffer pool, and after sending the message, the batch is no longer in use, and the memory block is returned to the buffer pool.

Does that sound familiar? Yes, pooling technologies such as database connection pooling and thread pooling are almost all based on this principle. Through pooling technology, the overhead of creation and destruction is reduced, and the execution efficiency is improved.

The code is the best document, so let's take down the source code.

We use the principle of top-down to show you where the buffer pool is used, and then go deep into the cache pool for further analysis.

The following code makes some deletions, and the value retains the relevant parts of this article for analysis.

RecordAccumulator actually manages a batch queue. We see that the implementation of the append method actually calls the free method of BufferPool to request (allocate) a piece of memory space (ByteBuffer), and then wraps the empty memory space as batch and adds it to the back of the queue.

When the message is sent and is no longer using batch, RecordAccumulator will call the deallocate method to return the memory, which is actually called the deallocate method of BufferPool.

Obviously, BufferPool is the class managed by the buffer pool and is the focus of our discussion today. Let's first look at how to allocate blocks of memory.

First of all, the whole method is locked, so concurrent memory allocation is supported.

The logic is that when the requested memory size is equal to poolableSize, it is fetched from the cache pool. This poolableSize can be understood as the page size of the buffer pool as the basic unit of buffer pool allocation. Getting from the cache pool is actually taking an element from the ByteBuffer queue and returning it.

If the requested memory is not equal to a specific value, apply to the non-cache pool. At the same time, some memory is taken from the buffer pool and merged into the non-buffer pool. This nonPooledAvailableMemory refers to the amount of memory available in the non-buffer pool. To allocate memory from a non-buffer pool is to call ByteBuffer.allocat to allocate real JVM memory.

The memory of the cache pool is generally rarely reclaimed. The memory of the non-cache pool is discarded after use and then waited for the GC to be reclaimed.

Let's take a look at the code released by batch.

It is very simple, and it is divided into two situations. Either return the buffer pool directly or update the available memory for the non-buffer pool portion. Then notify the first element in the waiting queue.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.