Why is disk-based Kafka so fast? 05/08 Update SLTechnology News&Howtos

Why is disk-based Kafka so fast?

2026-05-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail why disk-based Kafka is so fast and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Kafka is a ubiquitous message middleware in big data field. At present, it is widely used in the real-time data pipeline within enterprises and helps enterprises to build their own stream computing applications. Although Kafka is based on disk data storage, it has the characteristics of high performance, high throughput and low latency, and its throughput is often tens of thousands or tens of millions. The reason is worth exploring.

Zero copy

This is mainly about the optimization done by Kafka on the consumer side using the "zero copy (zero-copy)" mechanism of the linux operating system. First, let's take a look at the general transfer path of data from a file to a socket network connection:

The operating system reads data from disk to Page Cache in kernel space (kernel space)

The application reads Page Cache data into the user space (user space) buffer

The application writes the data from the user space buffer back to the kernel space to the socket buffer (socket buffer)

The operating system copies data from the socket buffer to the NIC buffer sent by the network

This process involves 4 copy operations and 2 system context switches, and the performance is actually very inefficient. The "zero copy" mechanism of the linux operating system uses the sendfile method, which allows the operating system to send data directly from the Page Cache to the network, requiring only the last step of the copy operation to copy the data to the NIC buffer to avoid re-copying the data. The schematic diagram is as follows:

Through this "zero copy" mechanism, Page Cache combined with the sendfile method, the performance of the consumer side of Kafka is also greatly improved. This is why sometimes when consumers continue to consume data, we do not see that the disk io is relatively high. At this time, it is the operating system cache that provides data.

About why disk-based Kafka is shared here so quickly, I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.