What is the method of Kafka cluster optimization? 07/12 Update SLTechnology News&Howtos

What is the method of Kafka cluster optimization?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of what is the method of Kafka cluster optimization, the content is detailed and easy to understand, the operation is simple and fast, and it has a certain reference value. I believe you will gain something after reading this Kafka cluster optimization method. Let's take a look at it.

Background

As a professional data intelligence service provider, Getui has successfully served hundreds of thousands of APP and sent tens of billions of messages every day, resulting in a huge amount of log data. In order to meet the various needs of the business, we need to collect and centralize the logs for calculation. For this selection, we use a highly available, highly reliable, distributed Flume system to collect, aggregate and transmit massive logs. In addition, Flume is constantly iteratively upgraded to meet its specific log needs.

In the original way of log aggregation in remote computer room, the whole process is relatively simple. The logs generated by the business of computer room An are written to the Kafka cluster of the computer room in a variety of ways, and then the Flume of computer room B consumes the log data of Kafka of computer room An in real time through the network private line and then writes them to the Kafka cluster of the server room. The data of all computer rooms are centrally managed in the Kakfa cluster of computer room B in the same way. As shown in figure 1:

Figure 1: original remote log transfer mode

However, with the continuous increase of traffic, the bandwidth requirements of log data become higher in the process of increasing, and the bottleneck of bandwidth has become increasingly prominent. Based on the direct connect bandwidth cost of 1G in 2~3w/ months, the annual direct connect bandwidth expansion cost of a remote data center alone is as high as 30w. In this regard, how to find a cheaper and in line with the current business expectations of the transmission scheme? Avro has a fast compressed binary data form, and can effectively save data storage space and network transmission bandwidth, so it becomes the preferred scheme.

Optimization idea

Introduction to Avro

Avro is a data serialization system. It is a sub-project of Hadoop and a separate project of Apache. Its main features are as follows:

● 's rich data structures

● 's compressible, fast binary data type

File types that can be persisted by ●

● remote procedure call (RPC)

The mechanism provided by ● enables dynamic languages to process data easily.

For details, please refer to the official website: http://avro.apache.org/

Flume Avro scheme

Flume's RPC Source is Avro Source, which is designed as a highly scalable RPC server that can receive data from other Flume Agent Avro Sink or Flume SDK clients to Flume Agent, as shown in figure 2:

Figure 2: Avro Source process

For this mode, we plan to change our log transfer scheme to deploy Avro Sink in data center A to consume the log data of Kafka cluster in this room, compress the Avro Source sent to server room B, and then decompress the Kafka cluster written to server room B. the specific transmission mode is shown in figure 3:

Figure 3: Flume Avro transfer mode

Possible problems

We estimate that there are three main problems that may exist:

● whether the integrity of the data can be guaranteed when the direct connect fails.

● consumption evaluation of hardware such as CPU and memory in this mode

There is a problem with ● transmission performance.

Verify the situation

In view of the above problems, we have done several comparative experiments.

Environmental Readiness description:

1. Two servers, 192.168.10.81 and 192.168.10.82, and a Kakfa cluster on each server, simulating computer room An and computer room B.

two。 The two Kafka clusters correspond to topicA (source side) and topicB (target side). Log data with a total size of 11G is written in topicA to simulate the original log data.

3. Deploy a Flume on 192.168.10.82 to simulate the original transmission mode.

4. 192.168.10.81 server deploys Avro Sink,192.168.10.82 deploys Avro Source, simulates Flume Avro transport mode.

Original Flume schema validation (non-Avro)

Monitor Kafka consumption:

81 Traffic Statistics:

82 Traffic Statistics:

Consuming all messages takes time: 20min

Statistics of total consumption log entries: 129748260

Total flow: 13.5G

Avro schema verification

Configuration instructions:

Avro Sink configuration:

# kafkasink is the name of kafkatokafka's sinks and can be used with more than one Separate spaces kafkatokafka.sources = kafka_dmc_bulletkafkatokafka.channels = channel_dmc_bulletkafkatokafka.sinks = kafkasink_dmc_bulletkafkatokafka.sources.kafka_dmc_bullet.type = org.apache.flume.source.kafka.KafkaSourcekafkatokafka.sources.kafka_dmc_bullet.channels = channel_dmc_bulletkafkatokafka.sources.kafka_dmc_bullet.zookeeperConnect = 192.168.10.81:2181kafkatokafka.sources.kafka_dmc_bullet.topic = topicAkafkatokafka.sources.kafka_dmc_bullet.kafka.zookeeper.connection.timeout.ms = 150000kafkatokafka.sources. Kafka_dmc_bullet.kafka.consumer.timeout.ms = 10000kafkatokafka.sources.kafka_dmc_bullet.kafka.group.id = flumeavrokafkatokafka.sources.kafka_dmc_bullet.batchSize = 5000#source kafkasink_dmc_bullet configuration Multiple sink can be configured to improve compression transmission efficiency kafkatokafka.sinks.kafkasink_dmc_bullet.type = org.apache.flume.sink.AvroSinkkafkatokafka.sinks.kafkasink_dmc_bullet.hostname = 192.168.10.82kafkatokafka.sinks.kafkasink_dmc_bullet.port = 555555 / one-to-one correspondence with source's rpc port kafkatokafka.sinks.kafkasink_dmc_bullet.compression-type = deflate// compression mode kafkatokafka.sinks.kafkasink_dmc_bullet.compression-level = 6 / compression ratio 1~9kafkatokafka.sinks.kafkasink _ dmc_bullet.channel = channel_dmc_bulletkafkatokafka.sinks.kafkasink_dmc_bullet.channel = channel_dmc_bulletkafkatokafka.sinks.kafkasink_dmc_bullet.requiredAcks = 1kafkatokafka.sinks.kafkasink_dmc_bullet.batchSize = channel with 5000#source kafkasink_dmc_bullet Only one kafkatokafka.channels.channel_dmc_bullet.type = memorykafkatokafka.channels.channel_dmc_bullet.capacity = 100000#kafkatokafka.channels.channel_dmc_bullet.byteCapacity = 10000#kafkatokafka.channels.channel_dmc_bullet.byteCapacityBufferPercentage = 10kafkatokafka.channels.channel_dmc_bullet.transactionCapacity = 5000kafkatokafka.channels.channel_dmc_bullet.keep-alive = 60

Avro Source configuration:

# kafkasink is the name of kafkatokafka's sinks and can be used with more than one Separate spaces kafkatokafka.sources = kafka_dmc_bulletkafkatokafka.channels = channel_dmc_bulletkafkatokafka.sinks = kafkasink_dmc_bulletkafkatokafka.sources.kafka_dmc_bullet.type= avrokafkatokafka.sources.kafka_dmc_bullet.channels = channel_dmc_bulletkafkatokafka.sources.kafka_dmc_bullet.bind = 0.0.0.0kafkatokafka.sources.kafka_dmc_bullet.port = 55555//rpc port bind kafkatokafka.sources.kafka_dmc_bullet.compression-type= deflate// compression mode kafkatokafka.sources.kafka_dmc_bullet Configuration of .batchSize = 100#source kafkasink_dmc_bullet kafkatokafka.sinks.kafkasink_dmc_bullet.type= org.apache.flume.sink.kafka.KafkaSinkkafkatokafka.sinks.kafkasink_dmc_bullet.kafka.partitioner.class = com.gexin.rp.base.kafka.SimplePartitionerkafkatokafka.sinks.kafkasink_dmc_bullet.channel = channel_dmc_bulletkafkatokafka.sinks.kafkasink_dmc_bullet.topic = topicBkafkatokafka.sinks.kafkasink_dmc_bullet.brokerList = 192.168.10.82 channel_dmc_bulletkafkatokafka.sinks.kafkasink_dmc_bullet.topic 9091192.168.10.82 com.gexin.rp.base.kafka.SimplePartitionerkafkatokafka.sinks.kafkasink_dmc_bullet.channel 9092192.168.10 .82: 9093kafkatokafka.sinks.kafkasink_dmc_bullet.requiredAcks = 1kafkatokafka.sinks.kafkasink_dmc_bullet.batchSize = 500kafkatokafka.channels.channelfoldmcroombullet.type = memorykafkatokafka.channels.channel_dmc_bullet.capacity = 100000kafkatokafka.channels.channel_dmc_bullet.transactionCapacity = 1000

Monitor Kafka consumption

81 Traffic Statistics:

82 Traffic Statistics:

Consuming all messages takes time: 26min

Statistics of total consumption log entries: 129748260

Total traffic: 1.69G

Fault simulation

1. Simulate the failure of Direct Connect. When the computer rooms An and B are not connected, the Avro Sink reports an error as follows:

two。 Monitor Kafka consumption and find that consumers have stopped spending:

3. After the fault processing is restored, the remaining logs continue to be consumed. according to statistics, the total number of logs is: 129747255.

Conclusion

1. When the direct connect fails, a small part of the off-channel data in the network transmission may be lost, which is due to network reasons and has nothing to do with the Avro mode; the data that stops consumption after the failure will not have any loss problem, and the data lost due to network reasons needs to be evaluated for its importance and whether it needs to be retransmitted.

two。 The traffic compression ratio is more than 80%. At the same time, we also tested a compression ratio of 1 to 9, which is very close to 9. The utilization rate of CPU and memory is not much different from that of the original transmission mode, and the optimization effect of bandwidth is obvious.

3. The transmission performance is appropriately weakened due to compression. If the single Sink is extended from 20 minutes to 26 minutes, the number of Sink can be appropriately increased to improve the transmission rate.

Implementation results of production environment

The implementation results are as follows:

1. Due to the bandwidth consumption of other services, the total bandwidth utilization is saved by more than 50%. At this stage, the bandwidth rate during the peak period does not exceed that of 400Mbps.

two。 The limit of each Sink transfer rate is about 3000 lines per second. The problem of compressing the transfer rate is solved by increasing Sink, but it will appropriately increase the loss of CPU and memory.

This is the end of the article on "what is the method of Kafka cluster optimization?" Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "what is the method of Kafka cluster optimization". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.