What is the relationship between Kafka and page cache and buffer cache? 04/19 Update SLTechnology News&Howtos

What is the relationship between Kafka and page cache and buffer cache?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about the relationship between Kafka to page cache and buffer cache. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Preface

A soul torture about Kafka: why is it so fast?

In other words, why can it achieve such large throughput and such low latency?

There are many articles that have answered this question, but only focus on one of them, that is, the use of page cache. First, take a brief look at page cache in the Linux system (and meet buffer cache, by the way).

Page cache & buffer cache

Execute the free command and notice that there are two columns named buffers and cached, and there is also a line named "- / + buffers/cache".

~ free-m

Total used free shared buffers cached

Mem: 128956 96440 32515 0 5368 39900

-/ + buffers/cache: 51172 77784

Swap: 16002 0 16001

Where the cached column represents the current page cache (page cache) usage, and the buffers column represents the current block cache (buffer cache) usage. Explain in one sentence: page cache is used to cache page data of files, and buffer cache is used to cache block data of block devices, such as disks. Pages are logical concepts, so page cache is at the same level as the file system; blocks are physical concepts, so buffer cache is at the same level as block device drivers.

The common purpose of page cache and buffer cache is to speed up the data I write-back O: when writing data, first write to the cache, mark the written page as dirty, and then store flush externally, that is, write-back in the cache write mechanism (the other is not used by write-through,Linux); read the cache first, and then go to the external storage to read if it is missed, and add the read data to the cache. The operating system always actively uses all free memory for page cache and buffer cache, and also uses algorithms such as LRU to eliminate cache pages when there is not enough memory.

Before the kernel version 2.4 of Linux, page cache and buffer cache were completely separate. However, most of the block devices are disks, and most of the data on the disk is organized by the file system, which leads to a lot of data being cached twice, wasting memory. So after the 2.4 kernel, the two caches are approximately merged: if the page of a file is loaded into page cache, then buffer cache only needs to maintain a pointer to the page. Only those blocks that are not represented by a file, or those that bypass the file system and operate directly (such as the dd command), will actually be put into buffer cache. Therefore, when we mention page cache now, we basically refer to both page cache and buffer cache. After this article, there is no longer a distinction, directly referred to as page cache.

The following figure approximately shows a possible page cache structure in a 32-bit Linux system, where the block size size is 1KB and the page size size is 4KB.

Each file in page cache is a radix tree (essentially a multi-forked search tree), and each node of the tree is a page. According to the offset within the file, you can quickly locate the page.

And then you can bring Kafka into this.

The use of page cache by Kafka

Why doesn't Kafka manage the cache itself instead of using page cache? The reasons are as follows:

Everything in JVM is an object, and the object storage of data will bring the so-called object overhead, a waste of space.

If JVM manages the cache, it will be affected by GC, and too large heap will slow down the efficiency of GC and reduce throughput.

Once the program crashes, all self-managed cache data will be lost.

The relationship between the three major components of Kafka (broker, producer, consumer) and page cache can be shown in the following diagram.

When producer produces messages, the pwrite () system call [corresponding to FileChannel.write () API in Java NIO] is used to write data by offset, and all of them are written to page cache first. When consumer consumes messages, it uses the sendfile () system call [corresponding to FileChannel.transferTo () API] to transfer the data from page cache to broker's Socket buffer with zero copy, and then over the network.

What is not shown in the figure is the synchronization between leader and follower, which is the same as consumer: as long as follower is in ISR, data can also be transferred from the broker page cache where the leader is located to the broker where the follower is located through the zero-copy mechanism.

At the same time, the data in page cache is written back to disk with the scheduling of the flusher thread in the kernel and the call to sync () / fsync (). Even if the process crashes, you don't have to worry about data loss. In addition, if the message to be consumed by consumer is not in page cache, it will be read to disk, and some adjacent blocks will be pre-read and put into page cache to facilitate the next reading.

From this we can draw an important conclusion: if there is little difference between the production rate of Kafka producer and the consumption rate of consumer, then the whole production-consumption process can be completed almost only by reading and writing to broker page cache, and there is very little disk access. And when Kafka persists messages to the partition files of each topic, it is only appended sequentially, which makes full use of the fast sequential access of the disk and has high efficiency.

Notes and related parameters

For clusters that simply run Kafka, the first thing to notice is to set the appropriate (not so large) JVM heap size for Kafka. From the above analysis, we can see that the performance of Kafka has little to do with heap memory, but the demand for page cache is huge. According to experience, allocating 5~8GB heap memory to Kafka is sufficient, and using the remaining system memory as page cache space can maximize the efficiency of page cache.

Another issue that needs special attention is lagging consumer, that is, consumer, which is slow in consumption rate and obviously lagging behind. There is a high probability that the data they want to read is not in broker page cache, so it will add a lot of unnecessary disk reading operations. To make matters worse, the "cold" data read by lagging consumer still enters the page cache, contaminating the "hot" data that most normal consumer reads, and deteriorating the performance of the normal consumer. This problem is particularly important in a production environment.

As mentioned earlier, the data in page cache is written back to disk as the flusher thread in the kernel is scheduled. Related to it are the following four parameters, which can be adjusted if necessary.

/ cycle of proc/sys/vm/dirty_writeback_centisecs:flush check. The unit is 0.01 seconds, and the default value is 500, or 5 seconds. Each check is handled according to the logic controlled by the following three parameters.

/ proc/sys/vm/dirty_expire_centisecs: if the page in page cache is marked as dirty for longer than this value, it will be brushed directly to disk. The unit is 0.01 seconds. The default value is 3000, or half a minute.

/ proc/sys/vm/dirty_background_ratio: if the total size of dirty page exceeds this value, the flusher thread will be scheduled to write to disk asynchronously in the background without blocking the current write () operation. The default value is 10%.

/ proc/sys/vm/dirty_ratio: if the total size of dirty page exceeds this value, write () operations of all processes are blocked and each process is forced to write its own files to disk. The default value is 20%.

Thus it can be seen that the parameters 2 and 3 are more flexible in the adjustment space, and try not to reach the threshold of parameter 4, which is too expensive.

The above is what Kafka is about the relationship between page cache and buffer cache. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.