Why is Kafka so fast? 07/06 Update SLTechnology News&Howtos

Why is Kafka so fast?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about why Kafka is so fast, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

In system design, message middleware is used for service asynchronization, system decoupling, or traffic peaking. Commonly used message middleware such as rabbitMq,activeMq and Ali's RocketMq have their own advantages, but in terms of throughput, kafka is one of the best. In the case of a single machine, the comparison made by netizens is as follows:

Performance comparison

Why is kafka so fast?

Publish and subscribe model

A common publish and subscribe model is shown in the following figure:

Publish and subscribe model

Take kafka as an example, the producer generates the message and Push the message to the kafka cluster, and the consumer actively goes to the kafka cluster Pull data. The advantage of this model is that the rate of consumption is completely controlled by consumers, and the kafka cluster is similar to a reservoir, avoiding the phenomenon that consumers are overwhelmed because producers generate news too fast and consumers are too late to consume.

Why kafka is about to analyze from two aspects: when the message generated by the producer is written to the cluster and when the consumer consumes the message, it is read quickly from the cluster.

Write fast

Write speed is mainly due to two reasons: sequential write and MMFile.

Sequential write

Kafka stores messages on the hard disk, which is generally thought to be slow to read and write, but why is kafka fast? Generally speaking, slow disk read and write means that random read and write is relatively slow, because the disk has to be physically addressed every time it is read and written randomly, which is a very time-consuming operation, and the speed of sequential read and write is relatively fast. As shown in the following figure:

Sequential write

Each time kafka receives a new message, it stores the message in the tail, storing the message sequentially. But once the message is stored, it cannot be deleted.

When consumers consume sequentially, each consumer will have an offset to record the location of the messages currently consumed. As shown in the following figure:

Sequential consumption

MMFile

MMFile refers to Memory Mapped Files, that is, memory mapping technology. In order to solve the difference between memory and hard disk read and write speed, the operating system uses memory mapping technology. Memory is divided into several pages, and each page is mapped to a disk space. Because the memory is far less than the disk space, paging is often loaded into memory according to certain algorithms, such as first-in, first-out (FIFO), recently least used (LRU) and so on. Paging in memory corresponds to disk space, and the operating system will regularly scan the data in paging in memory to disk at an appropriate time. Why can this approach improve the efficiency of writing? Usually, the execution of CPU is divided into kernel mode and user mode for security. Only kernel mode can operate iO devices. Memory space is divided into kernel space and user space. Usually, the data in memory is written to disk in the following steps:

Write operation

The data is copied from user space to kernel space and then written to the IO device from kernel space. MMFile, on the other hand, saves the overhead of copying data from user space to kernel space.

Read fast

The read speed is mainly due to zero copy (Zero Copy) technology. The steps of writing data from memory space to disk are described above, so reading data from disk is just the opposite. The specific process is as follows:

Read operation

Data is read from disk to kernel space, then copied from kernel space to user space, then to Socket, and finally transferred to consumers.

Linux provides a sendFile system call that copies data directly from disk to kernel space. It saves a copy of data from user space to kernel space. This is the so-called zero copy technology.

In fact, there is another important reason why kafka throughput is high from the point of view of reading and writing.

Batch data compression

Instead of compressing every message, kafka compresses a batch of messages and sends the data uniformly. Turn all the messages into a batch file and throw them directly to the consumer.

This is why Kafka is so fast. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.