Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the Kafka rebalancing mechanism

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what is the Kafka rebalancing mechanism". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The so-called rebalancing refers to a partition redistribution mechanism that occurs when the topic subscribed to by kafka consumer changes. There are generally three situations that trigger rebalancing:

Add or delete a consumer in consumer group, causing the partitions it consumes to be assigned to other consumer in the group

The topic of the consumer subscription has changed, for example, the topic of the subscription is in the form of regular expressions, such as test-*. If a new topic test-user is created at this time, then all partitions of the topic will be automatically assigned to the current consumer, and rebalancing will occur.

If a new partition occurs in the topic subscribed to by consumer, the new partition will be assigned to the current consumer, and rebalancing will be triggered.

There are three main rebalancing strategies provided by Kafka: Round Robin,Range and Sticky, with Range by default. The main differences between these three allocation strategies are:

Round Robin: all current partitions will be assigned to all consumer in turn by polling

Range: first, the number of partitions that can be consumed by each consumer is calculated, and then the partitions in the specified range are assigned to each consumer in order

Sticky: this partitioning strategy is a new strategy in the latest version, which achieves two main purposes:

The reason for assigning existing partitions to each consumer as evenly as possible is that both Round Robin and Range allocation strategies will actually cause several consumer to host too many partitions, resulting in uneven consumption pressure.

If rebalancing occurs, then based on the previous point, every effort will be made to ensure that the partitions consumed by the currently undown consumer will not be assigned to other consumer

This paper will explain the basic implementation principles of the three partition reallocation strategies described above through several examples.

1. Round Robin

As for the Roudn Robin redistribution strategy, it mainly uses a polling method to allocate all partitions, and the main steps of this strategy are as follows. Let's first assume that there are three topic:t0, T1, and T2, which have a partition number of 1, 2, and 3, respectively, so there are a total of six partitions: t 0-0, T1-0, T1-1, T2-0, T2-1, and T2-2. Suppose we have three consumer:C0, C1 and C2, and their subscriptions are: C0 subscription t0MagneC1 subscription t0 and T1 C2subscription t0, T1, and T2. Then the allocation steps for these partitions are as follows:

First, sort all the partition and consumer in lexicographic order, which is the string order of their names, then the above six partitions and three consumer are sorted as follows:

Then the six partitions are assigned to the three consumer sequentially by polling. If the current consumer does not subscribe to the topic where the current partition is located, the polling determines the next consumer:

An attempt was made to assign t0-0 to C0. Because C0 subscribes to t0, it can be assigned successfully.

An attempt was made to assign T1-0 to C1, which can be assigned successfully because C1 subscribed to T1.

An attempt was made to assign T1-1 to C2, which can be assigned successfully because C2 subscribed to T1.

Try to assign T2-0 to C0, because C0 does not subscribe to T2, so the next consumer will be polled

An attempt was made to assign T2-0 to C1. Since C1 does not subscribe to T2, it will poll the next consumer

An attempt was made to assign T2-0 to C2, which can be assigned successfully because C2 subscribed to T2.

Similarly, since the topic of T2-1 and T2-2 is not subscribed by C0 and C1, neither will be successfully allocated and will eventually be allocated to C2.

After all the partitions are allocated according to the steps above, the final subscription for the partition is as follows:

From the above step analysis, we can see that the polling strategy is to simply sort all the partition and consumer in dictionary order, and then assign the partition to each consumer in turn. If the current consumer does not subscribe to the current partition, then the next consumer will be polled until all the partitions are finally allocated. However, as can be seen from the above allocation results, the polling method will result in an inconsistent number of partitions carried by each consumer, resulting in uneven consumer pressure.

2. Range

The so-called Range reallocation strategy first calculates the number of partitions that each consumer will host, and then assigns a specified number of partitions to the consumer. Here we assume that there are two consumer:C0 and C1, two topic:t0 and T1, each of which has three partitions, so there are a total of six partitions: T0-0, T0-1, T0-2, T1-0, T1-1, and T1-2. Then the Range allocation policy will assign partitions as follows:

It should be noted that the Range policy is assigned according to topic. For example, if we explain with t0, it will first get all partitions of t0: t0-0, t0-1 and t0-2, as well as all consumer:C0 and C1 subscribed to the topic, and sort these partitions and consumer in lexicographical order.

Then calculate the number of partitions per consumer according to the average distribution, and if it is not divided, the extra partitions will be calculated to the first few consumer in turn. For example, if there are three partitions and two consumer, then each consumer will get at least one partition, and 3 divided by 2 will leave 1, then the excess will be counted to the first few consumer in turn, that is, the 1 here will be assigned to the first consumer. To sum up, C0 will start with the 0th partition and assign 2 partitions, while C1 will start with the second partition and assign 1 partition.

In the same way, follow the steps above to allocate the following topic.

Finally, the allocation of the above six partitions is as follows:

As you can see, if allocated according to Range partitions, it essentially iterates through each topic in turn, and then allocates the partitions of these topic evenly according to the number of consumer subscribed to. This approach computationally results in the allocation of more partitions to the previous consumer, resulting in an imbalance in the pressure on each consumer.

3. Sticky

The Sticky policy is a new policy in the new version. As the name implies, this policy will ensure that the partitions that have been allocated at the time of redistribution can continue to be consumed by the current consumption consumer as far as possible. Of course, the premise is that the number of partitions allocated by each consumer is roughly the same, which ensures that the consumption pressure of each consumer is more balanced. With regard to the allocation strategy of this allocation method, we will explain it in two cases, namely, the allocation of the initial state and the allocation when a certain explorer goes down.

3.1 initial allocation

The characteristic of the initial state allocation is that none of the partitions have been assigned to any consumer. Here we assume that there are three consumer:C0, C1 and C2, three topic:t0, T1 and T2, and these three topic have 1, 2, and 3 partitions, respectively, so the total partitions are: T0-0, T1-0, T1-1, T2-0, T2-1, and T2-2. For subscriptions, here C0 subscribes to t0Magol C1, to t0 and 1MagneC2, and to t0, T1 and T2. The partition allocation rules here are as follows:

First, sort all the partitions as follows: first, sort according to the number of consumer allocated by the current partition from low to high, and if the number of consumer is the same, sort according to the dictionary order of the partition. The six partitions here are sorted as follows because the subscriptions of the topic they are located in are different:

Then sort all the consumer in the following way: first, sort according to the number of partitions that have been allocated by the current consumer, and if the number of partitions assigned by the two consumer is the same, they will sort by the lexicographic order of their names. Since none of the three consumer was assigned any partitions initially, the sorting result is the result of their lexicographic sorting:

Then each partition is traversed and assigned to each consumer in turn. First of all, it should be noted that the traversal here is not assigned to C1 after C0 has been allocated, but all consumer is traversed from scratch each time the partition is allocated. If the current consumer does not subscribe to the current partition, it will traverse the next consumer. Then it should be noted that during the whole allocation process, the number of partitions allocated by each consumer changes dynamically, and this change will be reflected in the ranking of each consumer. For example, at the beginning, C0 is the first, and if a partition is assigned to C0, then C0 will rank last because it has the largest number of partitions. The overall allocation process for the above six partitions is as follows:

First, try to assign T2-0 to C0. Because C0 does not subscribe to T2, the allocation is not successful. Continue to poll the next consumer.

Then try to assign T2-0 to C1. Because C1 does not subscribe to T2, the allocation is not successful. Continue to poll the next consumer.

Then try to assign T2-0 to C2. Because C2 subscribes to T2, the allocation is successful. At this time, due to the change in the number of partitions allocated by C2, the sorting result of each consumer change is as follows:

The next T2-1 and T2-2, since only C2 subscribes to T2, will eventually be assigned to C2. Finally, after T2-0, T2-1, and T2-2 are assigned, the ranking of each consumer and its partition allocation are as follows:

Then continue to assign T1-0, first try to assign it to C0, because C0 does not subscribe to T1, so the allocation is not successful, continue to poll the next consumer

Then try to assign T1-0 to C1, and because C1 subscribes to T1, the assignment is successful, and the consumer and its assigned partitions are as follows:

Similarly, T1-1 is assigned next, although both C1 and C2 subscribe to T1, but because C1 comes before C2, the partition is assigned to C1, that is:

Finally, an attempt was made to assign t0-0 to C0. Because C0 subscribed to t0, the assignment was successful. The final allocation result is:

In the above allocation process, it is always important to note that although the order of the consumer in the example does not change, this is because the sort result of exactly the number of partitions allocated by each consumer after each partition allocation is consistent with the initial state. Readers can also compare this allocation method with the Round Robin explained earlier, and it is obvious that the Sticky redistribution strategy is more evenly distributed.

3.2Simulator consumer downtime

Because the final partition allocation in the previous example is relatively simple to simulate an outage, we use a different subscription strategy. Here we have three consumer for our example: C0, C1 and C2 topic: T0, T1, T2 and T3, each topic has two partitions, so the total partitions are: T0-0, T0-1, T1-0, T1-1, T2-0, T2-1, T3-0 and T3-1. The subscription situation here is that all topics are subscribed to by three consumer. If the partition allocation policy of Sticky is followed, the initial status of the allocation is as follows. Readers can calculate it as explained in the previous example:

Here, we assume that in the process of consumption, C1 is down, and rebalancing occurs. According to the Sticky policy, the redistribution steps are as follows:

First, the unallocated partitions after downtime are sorted by sorting the number of consumer owned by the partition from low to high, and if the number of consumer is the same, sort by the lexicographic order of the partition. It should be noted here that because only C1 is down, the unallocated partitions are: T0-1, T2-0, and T3-1, and the sorted result is:

Then sort all the consumer as follows: first, sort the consumer according to the number of consumer it has, and if the number is the same, sort according to the dictionary order of the consumer name. The sorting result is as follows:

Then iterate through each partition in turn and assign it to each consumer. It is important to note that during the allocation process, the number of partitions allocated by consumer is changing, and this change is reflected in the sorting of consumer:

First, try to assign t0-1 to C2. Because C2 subscribes to t0, it can be assigned successfully. The consumer sorting and partition allocation are as follows. It should be noted that although C2 and C0 have the same number of partitions after allocation, the sorting situation will change because C0 precedes C2 in lexicographical order:

Then try to assign T2-0 to C0, and because C0 subscribes to T2, the assignment can be successful, and the consumer sorting and partition allocation are as follows:

Finally, we try to assign T3-1 to C2. Because C2 subscribes to T3, the allocation can be successful. In this case, consumer sorting and partition allocation are as follows:

In the above partition allocation process, we can see that due to the continuous allocation of partitions, the number of partitions owned by each consumer is also changing, so the sorting situation is also changing, but in the end, we can see that each partition is evenly distributed to each consumer, and it also ensures that the partitions that have been consumed by the current consumer will not be assigned to other consumer.

That's all for the content of "what is the Kafka rebalancing mechanism". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report