In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "the principle of Kafka producer message partitioning mechanism". In daily operation, I believe many people have doubts about the principle of Kafka producer message partitioning mechanism. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "Kafka producer message partitioning mechanism principle". Next, please follow the editor to study!
Why is it zoned?
The concept of Topic, which is a logical container that holds real data, is also divided into several partitions under the topic, that is to say, the message organization of Kafka is actually a three-level structure: topic-partition-message. Each message under the topic is saved in only one partition, not multiple copies in multiple partitions. This picture on the official website shows very clearly.
The three-tier structure of Kafka is as follows:
Seeing this picture, I have a few questions. Why did Kafka make such a design? Why use partitions instead of using multiple Topic directly?
The role of zoning
In fact, the function of partitioning is to provide load balancing, or the main reason for partitioning data is to achieve high scalability (Scalability) of the system.
Different partitions can be placed on machines of different nodes, and data read and write operations are carried out according to the granularity of partitions, so that the machines of each node can independently perform the read and write request processing of their own partitions. we can also increase the throughput of the overall system by adding new node machines.
In fact, the concept of partition and partition database were already done by Daniel as early as 1980. For example, at that time, a database called Teradata introduced the concept of partition.
Partitions are called differently in different distributed systems: for example, partitions are called partitions in Kafka, sharded Shard in MongoDB and Elasticsearch, Region in HBase and vnode in Cassandra
On the surface, the principles of their implementation may not be the same, but the overall idea of the underlying Partitioning has never changed.
In addition to providing the core function of load balancing, partitions can also be used to achieve some other business-level requirements, such as the problem of message ordering at the business level.
The Partition Policy in Kafka
The partition policy in Kafka is the algorithm that determines the partition to which the producer sends the message.
Kafka provides a default partition policy, as well as custom partition policy
Default partitioning policy
Custom partitioning policy
Default partitioning policy
Polling Policy (Round-robin)
Random Policy (Randomness) (obsolete)
Message key Policy (Key-ordering)
Geographical zoning strategy
Polling strategy
Also known as Round-robin policy, that is, sequential allocation
For example, if there are three partitions under a topic, the first message is sent to partition 0, the second to partition 1, the third to partition 2, and so on. When the fourth message is produced, it starts again, that is, it is assigned to partition 0, as shown in the following figure
If you do not specify the partitioner.class parameter, your producer program will "store" messages evenly among all partitions of Topic in a polling manner
Polling strategy has a very good load balancing performance, it can always ensure that messages are distributed equally to all partitions to the maximum extent. By default, it is the most reasonable partitioning strategy and one of our most commonly used partitioning strategies.
Random strategy
Also known as Randomness strategy, random means that we randomly place messages on any partition, as shown in the following figure
If you want to implement the random policy version of the partition method, it is simple and requires only two lines of code:
List partitions = cluster.partitionsForTopic (topic); return ThreadLocalRandom.current () .nextInt (partitions.size ())
First calculate the total number of partitions for the Topic, and then randomly return a positive integer less than it
In essence, the random strategy also strives to disperse the data evenly to each partition, but from the actual performance, it is inferior to the polling strategy, so if we pursue the uniform distribution of data, it is better to use the polling strategy.
In fact, the random policy is the partitioning policy used by the producers of the old version, which has been changed to polling in the new version.
Message key policy
Also known as Key-ordering policy, Kafka allows you to define message keys for each message, referred to as Key for short
This Key is very useful, it can be a string with a clear business meaning, such as customer code, department number, business ID, etc.; it can also be used to represent message metadata
Especially in the era when Kafka does not support timestamps, in some scenarios, engineers encapsulate the message creation time directly into the Key.
Once the message Key is defined, you can ensure that all messages from the same Key go into the same partition. Because the messages under each partition are processed sequentially, this strategy is called the message key policy, as shown in the following figure.
The partition method to implement this strategy is equally simple, requiring only the following two lines of code:
List partitions = cluster.partitionsForTopic (topic); return Math.abs (key.hashCode ())% partitions.size ()
First calculate the total number of partitions of the Topic, and then calculate the absolute values of the hashCode and the partition number of the key.
Kafka's choice of default partition policy: if Key is specified, press message key policy is implemented by default; if Key is not specified, polling policy is used
Geographical zoning strategy
The above zoning strategies are relatively basic, but there is also a more common one, that is, the so-called location-based zoning strategy.
Of course, this strategy is generally only aimed at those large-scale Kafka clusters, especially those that cross cities, countries and even continents.
Custom partitioning policy
After talking about the default partition, let's talk about the custom partition.
If you want to customize the partition policy in Kafka, you need to explicitly configure the producer-side parameter partitioner.class
How to set this parameter? The method is simple. When you write a producer program, you can write a concrete class that implements the org.apache.kafka.clients.producer.Partitioner interface.
The interface is also simple, defining only two methods: partition () and close (). Usually you only need to implement the most important partition method, as shown in the following code
/ * Compute the partition for the given record. * * @ param topic The topic name * @ param key The key to partition on (or null if no key) * @ param keyBytes The serialized key to partition on (or null if no key) * @ param value The value to partition on or null * @ param valueBytes The serialized value to partition on or null * @ param cluster The current cluster metadata * / public int partition (String topic, Object key, byte [] keyBytes, Object value, byte [] valueBytes, Cluster cluster) / * This is called when partitioner is closed. * / public void close ()
Here, topic, key, keyBytes, value and valueBytes all belong to message data, while cluster is cluster information (such as how many topics and Broker there are in the current Kafka cluster)
Kafka gives you so much information in the hope that you can make full use of this information to partition the message and figure out which partition it will be sent to.
As long as your own implementation class defines the partition method and sets the partitioner.class parameter to the Full Qualified Name of your own implementation class, the producer program will partition the message according to your code logic.
At this point, the study of "the principle of Kafka producer message partitioning mechanism" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.