How Kafka implements best practices for cross-AZ deployment 07/19 Update SLTechnology News&Howtos

How Kafka implements best practices for cross-AZ deployment

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Kafka how to deploy best practices across AZ, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Kafka for deploying best practices across AZ

Cross-AZ deployment is a more effective way to achieve high service availability, but also very cost-effective. If you implement cross-AZ deployment, you can not only eliminate a single point of service, but also gradually build the following capabilities: service isolation, grayscale publishing, Nation1 redundancy, which can be described as killing many birds with one stone. The previous article introduced ES's cross-AZ deployment practice, and this article continues to show how Kafka implements cross-AZ deployment.

Mode of realization

"broker.rack" is a parameter in the server-side Broker configuration file, which is similar to Rack or Zone in ES. It "groups" the Broker in the cluster through Tag, and achieves cross-Rack fault tolerance when allocating partition copies. This parameter accepts a value of type "string", which defaults to null;. In addition, "broker.rack" does not support dynamic updates and is read-only, which means:

Updating the broker.rack of Broker requires a restart of broker

For a cluster deployed across AZ, expansion and scaling does not require restart of other Broker of the cluster

If you want to add this configuration to the Kafka cluster in the production environment to achieve cross-AZ deployment, you need to restart all Broker of the cluster.

Specific configuration examples are as follows:

Broker.rack=my-rack-id

When creating a Topic, it is constrained by the broker.rack parameter to ensure that partitioned replicas can span as many Rack as possible, that is, n = min (# racks, replication-factor), where n means that partitioned replicas will be distributed across n Rack.

So what is as much as possible?

This is based on the Kafka partition allocation algorithm, the specific implementation can refer to the source function (assignReplicasToBrokers).

What I want to say here is that when Kafka allocates partitions to Topic, it generates an ordered list based on the following parameters: replication-factor,partitions, two random parameters (startIndex,fixedStartIndex), the number of partitions on each Broker, and broker.rack generates an ordered list of all available Broker ID, which is generated according to the Broker polling broker.rack. Suppose:

Rack1: 0 1rack2: 2 3

The resulting list becomes: [0, 2, 1, 3]. When partitioning, when Broker meets either of the following two conditions, the replica will not be assigned to that node:

A copy of the partition already exists in the broker.rack where this Broker resides, and there is no copy of the partition in the broker.rack.

A copy of the partition already exists in this Broker, and there are copies of the partition that are not in other Broker.

Fig. 1 partial source code of broker.rack parameter allocation mechanism

Through the above explanation, you should have some understanding of the copy allocation mechanism of Kafka. In general:

When replication-factor#broker.rack, at least one broke.rack has a complete set of partition copies

In particular, when replication-factor=2#broker.rack, the partition of Topic will be evenly distributed between the two AZ.

It is also worth noting that:

When Kafka conducts Leader elections or Leader rebalancing, it does not care about broker.rack, that is, by default, Leader is distributed across multiple AZ.

Unlike ES, Kafka does not have the ability to automatically transfer replicas and restore broker

So, how do you achieve a high-availability deployment of broker within an AZ? It is very difficult to maintain the rack distribution in the real sense, and in the virtualization scenario, there is a host layer between the rack and the virtual machine / docker, so it is more expensive to maintain, so it is recommended to use the high availability group / placement group provided by cloud vendors.

Scene verification

Sort out the factors that affect partition allocation mentioned above, excluding two random parameters (affecting only the initial ID of broker list) and the number of partitions on Broker (this parameter affects the rebalance within AZ. The scenario verification in this article does not care about partition allocation within AZ).

The consistency of the number of broker in each AZ only affects the partition within the AZ. In order to simplify the verification scenario, we will not consider the inconsistency of each AZ broker for the time being. All available Broker ID parameters can be simplified to "number of single AZ broker" instead.

The remaining factors and the values that need to be verified are shown in the following table:

Table 1 list of parameters

Parameters.

Verification value

Remarks

Number of Topic partitions

1,2,3

Consider the scenario of 1 partition and parity partition

Number of Topic copies

1,2,3

Consider scenarios with 1 copy and odd and even copies

Number of cluster AZ

1,2,3

Consider a scenario with 1 AZ (no AZ) and parity AZ

Qty per AZ broker

1,2,3

Consider various scenarios where the number of partitions or copies is less than, equal to, or larger than the broker of a single AZ.

In order to exhaust the various combinations in the above table, a total of 81 scenarios need to be verified. In order to eliminate the influence of accidental factors, each scenario needs to be tested many times, which will be very tedious and inefficient.

We can regard the parameters as "factors", the verification value as "level", and the value of each factor has nothing to do with other factors. This can adopt the idea of "orthogonal experiment" (selecting some representative points from the comprehensive experiment according to the orthogonality, which is a highly efficient, fast and economical experimental design method). Verify that the scenarios in the above table can just use the most commonly used L9 (3 ^ 4) orthogonal table (this table is generally used for 3 levels and 4 factors). A total of 9 scenarios need to be verified, and each scenario is verified 10 times to check for accidental factors.

In fact, you only need to verify seven scenarios, because two of them do not meet the requirements for Kafka to create a Topic. You can clearly see the distribution of partition replicas through the visual interface of Kafka Manager, like this:

Fig. 2 the distribution of two copies of 2AZ in the third division, in which broker id is 0 AZ,3,4,5, 1 AZ,3,4,5 and 1 AZ.

Fig. 3 the distribution of three copies of 2AZ in three zones, in which broker id is 0primel, one AZ,3,4,5, one AZ.

Using the orthogonal table of type L9 (34), the scene and conclusions are as follows:

Table 2 orthogonal scene table

Scene

Number of partitions

Number of copies

Number of AZ

Qty per AZ broker

Zonal distribution

Leader distribution

Remarks

one

Each AZ has a set of partitions

An AZ

Replication-factor=#broker.rack

two

one

two

Each AZ has a set of partitions

Two AZ random

Replication-factor=#broker.rack

three

one

three

Each AZ has a set of partitions

Two AZ random

Replication-factor=#broker.rack

four

two

one

two

three

The two AZ share a set of partitions

Two AZ random

The number of replication-factor partitions cannot be created

It can be seen that if you take 4 copies of a cluster deployed across AZ, you can achieve the ability of data backup within AZ and across AZ. If an AZ is mainly for disaster recovery, you can centralize all Leader in one AZ through Kafka's API (kafka-reassign-partitions.sh), thus reducing the latency of writing data across AZ. In general, however, the latency between AZ is often very low and is acceptable.

Cost and Network delay

The deployment architecture of multi-AZ is mainly the cost of hardware, while considering the cost, we need to combine the importance of data. If the importance of the data is high, the four-copy Topic configuration is required, and Kafka as a message queue, the storage of the Messages itself is not important, so the impact of cost is not significant.

The second problem is network delay. We use pressure test to verify what network delay will bring to Kafka. Create a cluster in availability Zone An and availability Zone B:

The benchmark cluster is a cluster of two machines deployed in North China availability Zone A.

Cross-AZ cluster is a cluster of two machines deployed in North China availability Zone An and availability Zone B.

The average direct ping delay of the two AZ and the ping delay within the AZ are 0.070ms and 1.171ms, respectively. Verify the impact of network delay from three angles.

Impact on message Writing

Producers can send messages to Kafka clusters in two ways: synchronous (acks=all) and asynchronous (acks=1). We use kafka's own stress testing tool (/ kafka-producer-perf-test.sh) to test the two clusters.

Create a topic in each of the above two clusters (--replication-factor 2-- partitions 1 focus leader is located in availability zone A)

The pressure measuring machine is located in North China availability Zone A, and each message is 300 bytes in size (to reflect the problems of the network, weakening its ability to process each message), sending 10000 data per second.

When the message producer has written asynchronously, that is, acks=1, there is almost no difference in the average latency between the two clusters, and the stress test results are 0.42 ms avg latency and 0.51 ms avg latency, respectively.

Pressure test parameters:

. / kafka-producer-perf-test.sh-- topic pressure1-- num-records 500000-- record-size 500000\-- throughput 10000-- producer-props bootstrap.servers=10.160.109.68:9092

Benchmark cluster stress test results:

Cross-AZ cluster stress test results:

When the message is synchronized, the writing of the message has an impact. The impact is still somewhat large. The benchmark cluster is 1.09 ms avg latency, and the cross-AZ cluster is 8.17 ms avg latency:

Pressure test parameters:

. / kafka-producer-perf-test.sh-- topic pressure-- num-records 500000-- record-size 500000\-- throughput 10000-- producer-props acks=all bootstrap.servers=10.160.109.68:9092

Benchmark cluster stress test results:

The following table is a comparison when the message size is 10 ~ 50. At 300 hours, the average latency of synchronous writes was found to be about 7 times the latency difference across AZ clusters / benchmark clusters.

Table 4 data comparison at step-by-step write time

Message body size

ten

fifty

one hundred

one hundred and fifty

two hundred

three hundred

Benchmark-average delay / ms

0.91

0.83

0.65

0.75

0.72

1.09

Average latency across AZ- / ms

5.39

4.48

5.23

5.14

8.17

Ratio (across AZ/ benchmark)

5.92

5.40

8.05

6.97

7.14

7.50

The impact on the cluster itself

Select a message body size of 1000 bytes to fully verify the impact of network transmission on cluster data synchronization in cross-AZ scenarios. For the benchmark cluster, we shut down a Broker, use the pressure test tool to write 5000000 pieces of data asynchronously, and then start the stopped Broker to obtain the time for the replica partition to complete synchronization with the leader; similarly, for the cross-AZ cluster, stop the Broker in availability zone B, and write the same data to the leader to obtain the time for the replica partition to complete synchronization with the leader, and compare the two.

The benchmark cluster completed replica synchronization at 26s:

Replica synchronization was completed at 142s across the AZ cluster, and during synchronization, there was a WARN with a connection timeout to ZK.

It can be seen that cross-AZ clusters have some potential problems when the message body is large under the ping delay.

Impact on Topic consumption

The consumption delay is also mainly concentrated in the distance across the AZ, but it is solvable. The consumer group supports the following parameters:

Client.rack # this parameter accepts a value of type string and is consistent with the value of broker.rack. Best practice-upgrade the production environment to a multi-AZ deployment

When your cluster is already running in a production environment and now needs to be upgraded to a cross-AZ deployment, how should the existing cluster be configured and upgraded?

When there is a node with brocker.rack parameter null in the cluster, even if there is only one machine, the creation of topic under the default parameter will fail, and an error will occur as follows:

With the parameter-- disable-rack-aware, you can ignore the broker.rack parameter for partition allocation. The upgrade of the production environment needs to modify the way the topic is created, and there is no impact on data writing and consumption during the upgrade process (and the partition will not be automatically rebalanced for the existing topic,Kafka).

The general steps for upgrading a production environment across AZ are as follows:

To create a test cluster across AZ and fully verify the impact of cross-AZ on the cluster, you need to pay attention to the impact of write throughput (whether the Topic needs to add partitions), the resource consumption of data synchronization in the cluster and the cross-AZ latency of consumption.

Make a full rollback plan and conduct a rollback drill

Add "broker.rack" parameter to expand the cluster capacity in the new availability zone.

Add the parameter "broker.rack" to the original nodes of the cluster, and scroll all the nodes

At the Topic level, partition the Topic at the appropriate time by manually specifying the replica allocation

Start Leader reallocation for Topic as required.

Matters needing attention

The prerequisite for cross-AZ deployment of Kafka is that there is a Zookeeper deployed across AZ, otherwise when the AZ where the ZK cluster is located fails, the Kafka cluster will not be available.

For cross-AZ deployment of the Kafka cluster in use, when the cluster size is large, the time span of rolling restart cluster operation will be very long, and all Topic will need to be partitioned and migrated manually. If there are many Topic in the cluster, the workload of this operation will be very large.

The cross-AZ upgrade of the production environment needs to be fully verified to count the message body size of the production environment. In particular, we need to pay attention to the delay impact on the cluster caused by the data synchronization in the quartile larger than 80 quartile, 90 quartile and 95 quartile, so as to avoid dragging down the cluster in the case of high concurrent writes.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.