In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to choose the appropriate number of partitions for Kafka. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
We are often faced with the problem of determining how many partitions should be set under a topic, and sometimes we don't know how to set it, how to evaluate it, and so on. Or someone asks you how many partitions are in the specific business topic in the current kafka cluster, how to know how many partitions are needed or how to choose the appropriate number of partitions.
1. Combine business scenarios with non-business conditions
So how should we choose the appropriate number of partitions?
Specific business specific analysis.
However, in the early stage, we can roughly evaluate how many partitions we can set according to these conditions: actual business scenarios (total message volume, message production or consumption frequency, required throughput, etc.), software conditions, hardware conditions, load conditions, etc.
two。 Use the pressure test tool to get the best number of zones.
Kafka officially also provides scripts to facilitate us to test our kafka cluster. We can test the current hardware conditions to find out how many partitions can be supported by the current machine environment, so as to achieve the best possible solution.
Producer performance test script: kafka-producer-perf-test.sh
Consumer performance test script: kafka-consumer-perf-test.sh
After setting a certain number of partitions of the topic, we can choose different parameters, such as the total number of messages sent, the size of a single message, throughput, acks, the number of consumption threads, and so on, so that we can get a test report after the stress test. The data contained in the report are: 50% Universe 90% Universe 95% Universe 99% message processing time, average processing time, message delivery Throughput per second, Byte size / number of messages pulled per second, Total consumption, rebalancing time, Throughput calculated by message count / message size, and so on.
Increasing the number of partitions appropriately can improve the throughput, but when a certain threshold is exceeded, the throughput will decrease accordingly.
3. Higher throughput is not always related to the number of partitions
For kafka producers, data can be written to each partition in parallel. For kafka consumers, each partition can only be consumed by one consumer thread, so the consumption parallelism of the consumption group depends on the number of partitions. It seems that the more partitions, the higher the throughput in theory.
But is this really the case?
The throughput of message middleware kafka is not just related to partitioning.
The throughput of message writing (production) is related to these: message size, message compression, message delivery (synchronous or asynchronous), message acknowledgement type acks, replica factor, and so on.
Similarly, the throughput of message consumption is related to the consumption speed of business logic.
4. The number of partitions is related to the operating system
The number of partitions cannot be increased indefinitely because it takes up file descriptors and the number of file descriptors available to the process is limited.
In general, if you want to set a large number of partitions, pay special attention to whether it exceeds the system's largest descriptor file. Although you can change the system configuration, you should try to avoid this. After all, file handles also have overhead.
5. Note the message write partition policy
We know which partition consumption is written to, and by default or some will calculate which partition it should be written to based on Key. At this time, we need to consider whether the applications strongly related to Key will affect your usage scenario.
For example, some application scenarios may only require messages in a partition to be orderly, which may affect this scenario once the number of partitions is adjusted.
Therefore, we will generally try our best to configure a better number of partitions to meet the target throughput in the next two years.
If the application is weakly associated with Key, we can increase the number of partitions according to the actual situation in the future.
6. The number of partitions will affect system availability
Kafka uses multi-replica mechanism to achieve high availability and high reliability of the cluster. Each partition will have at least one or more replicas, each replica will exist on different Broker nodes, and only leader replicas provide services.
All replicas within the kafka cluster are managed in an automated way, and the data of all replicas can be synchronized to a certain extent. When Broker fails, all partitions on the Broker node where the leader copy is located will be temporarily unavailable.
At this point, the follower copy in the cluster will re-elect the leader copy, and the kafka controller will be responsible for the whole process, and the partitions on the cluster will be temporarily unavailable, and if there are too many partitions, the unavailable time window will be larger.
7. The more partitions, the more time-consuming.
The more partitions, the longer the time it takes for kafka to start and shut down normally.
At the same time, the number of topic partitions takes more time to clean up logs and more time to delete. It is obvious in the old version, but it has been improved in the new version.
8. Theoretical reference setting value of number of partitions
In general, the number of partitions can be configured as an integral multiple of the number of Broker nodes. For example, if the Broker node is 3, then the number of partitions can be set to 3, 6, 9.
However, in the case of a large number of broker nodes, such as tens, hundreds, thousands, it is not appropriate, generally this is also relatively few, unless there is an order of magnitude of BAT. If necessary, you can further consider the introduction of reference factors such as rack when selecting the number of partitions.
9. Don't blindly analyze the actual situation.
Finally, when you increase the number of partitions later, you should pay attention to whether it is necessary or reasonable. The author has seen this scenario: logs are written to es after consumption, but there is a serious accumulation of messages, so the number of partitions is increased from 6 to 12. At this time, the accumulation situation is not improved very well, or even worse (for example, the log data of the same log file is discontinuous, that is, orderly). Finally, the topic can only be deleted and the original number of partitions can be reset.
Because the main bottleneck of the system is the writing ability of es, the consumption speed is slow, which leads to the accumulation of massive log messages. Therefore, it is important to analyze the current major problems (bottlenecks, etc.), and remember not to set the number of partitions arbitrarily or blindly.
This is how Kafka chooses the appropriate number of partitions. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.