Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of Kafka's high performance throughput?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is the principle of Kafka's high-performance throughput". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the principle of Kafka's high-performance throughput".

As an open source messaging system, Kafka is widely used in data buffering, asynchronous communication, pooling logs, system decoupling and so on. Compared with other common messaging systems such as RocketMQ, Kafka not only ensures most of the functional features, but also provides read and write performance.

This article will briefly analyze the performance of Kafka. First, we will briefly introduce the architecture of Kafka and the nouns involved:

Topic: used to divide the logical concept of Message, a Topic can be distributed across multiple Broker.

Partition: the basis for scale-out and parallelization in Kafka, where each Topic is split into at least one Partition.

Offset: the number of the message in Partition. The numbering order does not span the Partition.

Consumer: used to extract / consume Message from Broker.

Producer: used to send / produce Message to Broker.

Replication:Kafka supports redundant backup of Message in Partition units, and each Partition can be configured with at least 1 Replication (that is, only the Partition itself when there is only 1 Replication).

Leader: the Partition in each Replication collection selects a unique Leader, and all read and write requests are handled by Leader. Other Replicas synchronizes data updates locally from Leader, a process similar to the familiar Binlog synchronization in MySQL.

Broker is used in Broker:Kafka to accept requests from Producer and Consumer and to persist Message to the local disk. In each Cluster, a Broker is elected as the Controller, which is responsible for handling the Leader election of the Partition, coordinating the Partition migration, and so on.

ISR (In-Sync Replica): is a subset of Replicas and represents a collection of Replicas that is currently Alive and can be "Catch-up" with Leader. Since both reading and writing fall on the Leader first, generally speaking, the Replica that pulls data from the Leader through the synchronization mechanism will have some delay with the Leader (including the delay time and the number of delay bars). Any one that exceeds the threshold will kick the Replica out of the ISR. Each Partition has its own independent ISR.

These are almost all the terms we may encounter in the process of using Kafka, and all of them are the core concepts or components, feeling that Kafka is concise enough in terms of the design itself. This article focuses on the excellent throughput performance of Kafka and introduces the "cool techs" used in its design and implementation one by one.

Broker

Unlike in-memory message queues such as Redis and MemcacheQ, Kafka is designed to write all Message to low-speed, large-capacity hard disks in exchange for stronger storage capacity. In fact, the use of Kafka hard drive did not bring too much performance loss, "behave" copied a "shortcut".

First of all, the reason for saying "well-behaved" is that Kafka only does Sequence Imax O on disk, which is not a problem because of the particularity of reading and writing of the messaging system. With regard to the performance of disk Kafka O, cite a set of test data (Raid-5,7200rpm) officially given by the disk:

Sequence I/O: 600MB/s

Random I/O: 100KB/s

Therefore, the possible impact of low disk access speed on performance can be avoided by doing only the restriction of Sequence I hand O.

Next, let's talk about how Kafka "cut corners".

First, Kafka relies heavily on the PageCache capabilities provided by the underlying operating system. When there is a write operation in the upper layer, the operating system simply writes the data to PageCache and marks the Page attribute as Dirty. When the read operation occurs, look it up in PageCache first, then schedule the disk if a page fault occurs, and finally return the required data. In fact, PageCache uses as much free memory as possible as disk cache. At the same time, if other processes request memory, the cost of reclaiming PageCache is very small, so modern OS supports PageCache.

Using the PageCache function can also avoid caching data within JVM. JVM provides us with powerful GC capabilities, but also introduces some problems that are not suitable for Kafka design.

If the cache is managed within Heap, the GC thread of JVM will frequently scan the Heap space, resulting in unnecessary overhead. If the Heap is too large, performing a Full GC will be a great challenge to the availability of the system.

All objects in the JVM will inevitably carry an Object Overhead, which will reduce the effective space utilization of memory.

All In-Process Cache have the same PageCache in OS. So by placing the cache only in PageCache, you can at least double the available cache space.

If Kafka restarts, all In-Process Cache will fail, and PageCache managed by OS can still be used.

PageCache is only the first step, and Kafka uses Sendfile technology to further optimize performance. Before explaining Sendfile, let's first introduce the traditional network Imax O operation process, which is generally divided into the following four steps.

OS reads data from the hard disk to the PageCache in the kernel area.

The user process Copy data from the kernel area to the user area.

The user process then writes the data to the Socket, and the data flows into the Socket Buffer in the kernel area.

OS then Copy the data from the Buffer to the Buffer of the network card to complete a transmission.

The whole process experienced two Context Switch and four System Call. It is inefficient to copy the same data repeatedly between the kernel Buffer and the user Buffer. The second and third steps are not necessary, and the data copy can be completed directly in the inner core area. This is exactly the problem that Sendfile solves, and after Sendfile optimization, the entire Icano process looks like this.

From the above introduction, it is not difficult to see that the original intention of Kafka is to make every effort to complete data exchange in memory, whether as a whole message system or internal interaction with the underlying operating system. If the production and consumption schedules of Producer and Consumer are properly coordinated, zero I / O of data exchange can be achieved. This is why I say that Kafka's use of "hard drives" does not cause too much performance loss. Here are some indicators I have collected in the production environment.

(20 Brokers, 75 Partitions per Broker, 110k msg/s)

At this time, the cluster has only writes and no read operations. The traffic of Send around 10M/s is generated by Replicate between Partition. From the comparison of the rates of recv and writ, we can see that disk writing is the way of using Asynchronous+Batch, and the underlying OS may also optimize the disk write order. When Read Request comes in, it can be divided into two situations. The first is to exchange data in memory.

Send traffic increases from average 10M/s to average 60M/s, while disk Read only does not exceed 50KB/s. The effect of PageCache on reducing disk Imax O is very obvious.

The next step is to read some old data that has been swapped out of memory and written to disk for some time.

Other metrics remain the same, while disk Read has soared to 40+MB/s. At this point, all the data is already on the hard drive (reading the OS layer sequentially to the hard disk will optimize the Prefill PageCache). There are still no performance problems.

Tips

Kafka officials do not recommend mandatory disk writing through log.flush.interval.messages and log.flush.interval.ms on the Broker side, arguing that the reliability of data should be guaranteed through Replica, while forcing Flush data to disk will have an impact on overall performance.

Performance can be tuned by tuning / proc/sys/vm/dirty_background_ratio and / proc/sys/vm/dirty_ratio.

If the dirty page rate exceeds the first indicator, pdflush will be started to start Flush Dirty PageCache.

The dirty page rate exceeding the second indicator blocks all writes for Flush.

According to different business requirements, dirty_background_ratio can be reduced and dirty_ratio can be improved appropriately.

Partition

Partition is the foundation on which Kafka can scale out well, provide high concurrency processing and implement Replication.

Scalability. First, Kafka allows Partition to move freely between Broker in the cluster to equalize possible data skew problems. Second, Partition supports custom partitioning algorithms, such as routing all messages from the same Key to the same Partition. Leader can also be migrated in In-Sync 's Replica. Since all read and write requests for a Partition are handled only by Leader, Kafka will try its best to distribute the Leader evenly to all nodes of the cluster to avoid excessive concentration of network traffic.

Concurrency. Any Partition can only be consumed by one Consumer within a Consumer Group at a time (conversely, a Consumer can consume multiple Partition at the same time). Kafka's very simple Offset mechanism minimizes the interaction between Broker and Consumer, which makes Kafka not proportional to the performance of other message queues like other message queues as the number of downstream Consumer increases. In addition, if multiple Consumer happen to consume similar data in time sequence, it can achieve a very high PageCache hit rate, so Kafka can support high concurrent read operations very efficiently, which can basically reach the upper limit of stand-alone network card in practice.

However, the greater the number of Partition, the better. The more Partition, the more the average number per Broker. Given the Network Failure (Full GC) situation, Controller is required to re-elect Leader for all Partition on all down Broker, assuming that each Partition election consumes 10ms. If there are 500Broker Partition on the Broker, then reading and writing to the above Partition will trigger LeaderNotAvailableException during the 5s of the election.

Further, if the dead Broker is the Controller of the entire cluster, the first thing to do is to reappoint a Broker as the Controller. The newly appointed Controller needs to get all the Meta information of Partition from Zookeeper, about 3-5ms each, so if there are 10000 Partition, it will reach 30s-50s at this time. And don't forget that this is just the time it takes to restart a Controller, and add the time of the election Leader mentioned earlier-_!

In addition, on the broker side, the Buffer mechanism is used for both Producer and Consumer. The size of Buffer is uniformly configured, and the number is the same as the number of Partition. If the number of Partition is too large, the Buffer memory consumption of Producer and Consumer will be too large.

Tips

The number of Partition is pre-allocated as far as possible, although you can dynamically increase the Partition at a later stage, but it runs the risk of destroying the corresponding relationship between Message Key and Partition.

The number of Replica should not be too large, and if conditions permit, adjust the Partition in the Replica collection to different Rack as much as possible.

Make every effort to ensure that you can Clean Shutdown every time you stop Broker, otherwise the problem is not only the long time it takes to restore the service, but also data corruption or other weird problems.

Producer

Kafka's research and development team said that the entire Producer was rewritten with Java in version 0. 8, and the performance is said to have been greatly improved. I haven't compared and tried it myself, so I won't make a data comparison here. The extended reading at the end of this article mentions a set of control groups that I think are better. Interested students can give it a try.

In fact, in the Producer side of the optimization of most of the message system is relatively simple, but also into a whole, synchronous mutation step and so on.

The Kafka system supports MessageSet by default. Multiple Message are automatically typed into a Group and sent out, and the RTT of each communication is pulled down after sharing. And while organizing the MessageSet, the data can be reordered from explosive random writes to more stable linear writes.

In addition, it is important to highlight that Producer supports End-to-End compression. The data is compressed locally and transmitted on the network, and is generally not decompressed in Broker (unless Deep-Iteration is specified) until the message is decompressed on the client after it is Consume.

Of course, users can also choose to do compression and decompression on the application layer (after all, Kafka currently supports limited compression algorithms, only GZIP and Snappy), but doing so will unexpectedly reduce efficiency! Kafka's End-to-End compression works best with MessageSet, and the above approach directly separates the two. As for the reason is actually very simple, a basic principle of the compression algorithm: "the more repeated data, the higher the compression ratio." It has nothing to do with the content of the message body, regardless of the number of message bodies, and in most cases, a larger amount of data input will achieve a better compression ratio.

However, Kafka's adoption of MessageSet also leads to a certain degree of compromise in usability. Every time the data is sent, the Producer is send () and then thinks it has been sent, but in most cases the message is still in the MessageSet in memory and has not yet been sent to the network. At this time, if the Producer dies, the data will be lost.

In order to solve this problem, the design of version 0.8 of Kafka draws lessons from the ack mechanism in the network. If you have high performance requirements and can allow the loss of Message to some extent, you can set request.required.acks=0 to turn off ack and send at full speed. If you need to acknowledge the message sent, you need to set request.required.acks to 1 or-1, so what's the difference between 1 and-1? Here I would like to mention the question about the quantity of Replica mentioned earlier. If the configuration is 1, it means that the message only needs to be received and acknowledged by the Leader, and other Replica can be pulled asynchronously without immediate confirmation, ensuring reliability without inefficient. If it is set to-1, it means that the message must be Commit to all the Replica in the ISR collection of the Partition before returning ack. The sending of the message will be more secure, and the delay of the whole process will increase proportional to the number of Replica. Here, you need to optimize it according to different requirements.

Tips

Do not configure too many threads in Producer, especially when used in Mirror or Migration, which will aggravate the disorder of Partition messages in the target cluster (if your application scenario is sensitive to the order of messages).

The default for version 0. 8 request.required.acks is 0 (same as 0. 7).

Consumer

The design of the Consumer side is generally quite routine.

With Consumer Group, you can support both producer-consumer and queue access modes.

Consumer API can be divided into two types: High level and Low level. The former is heavily dependent on Zookeeper, so its performance is poor and not free, but it is extremely worry-free. The second does not rely on Zookeeper services and performs better in terms of freedom and performance, but all exceptions (Leader migration, Offset out of bounds, broker downtime, etc.) and Offset maintenance need to be handled on their own.

You can follow the 0.9 Release to be released in the near future. The developer rewrote another set of Consumer in Java. Merge the two sets of API together and get rid of the dependence on Zookeeper. It is said that the performance has been greatly improved.

Thank you for your reading, the above is the content of "what is the principle of Kafka's high-performance throughput". After the study of this article, I believe you have a deeper understanding of what the principle of Kafka's high-performance throughput is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report