Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to thoroughly understand the rules of setting parameters related to the size of Kafka messages

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how to thoroughly understand the rules of setting parameters related to the size of Kafka messages. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Some time ago, I received a request from users to adjust the message size of a topic in the Kafka cluster to 4m.

According to the Kafka message size rule, the production side adjusts the max.request.size to 4m, and the Kafka cluster sets the size of the topic-level parameter max.message.bytes to 4m.

The above settings are for Kafka version 2.2.x. It is important to note that in some older versions, relevant parameters, such as replica.fetch.max.bytes, need to be adjusted.

As can be seen from the above example, the setting of Kafka message size is still quite complicated, and it is also divided into versions, and there are a lot of parameters to pay attention to, and each of them looks similar. Not only do you need to pay attention to the settings of production, broker and consumer, but also distinguish between broker-level and topic-level settings, and you also need to know the meaning of each configuration.

The following through the analysis of the relevant parameters, combined with practical testing, to help you quickly understand the meaning of these parameters and rules.

Broker

The main parameters related to the message body size of broker are message.max.bytes, replica.fetch.min.bytes, replica.fetch.max.bytes and replica.fetch.response.max.bytes.

1 、 message.max.bytes

What is the maximum record batch size allowed by Kafka? what is record batch size? To put it simply, it is the message collection batch of Kafka. A batch contains multiple messages. The producer has a parameter batch.size, which means that the producer can send messages in batches to improve throughput. The following is the source code of the message.max.bytes parameter:

Kafka.log.Log#analyzeAndValidateRecords

As can be seen from the above source code, message.max.bytes does not limit the message body size, but limits the message size of a batch, so we need to note that the parameter setting for batch.size on the production side should be less than message.max.bytes.

The following is accompanied by an official explanation of Kafka:

The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large.

In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case.

This can be set per topic with the topic level max.message.bytes config.

The translation is as follows:

The maximum number of records allowed by Kafka. If you increase this number, and some consumers are older than 0.10.2, the consumer's acquisition size must also be increased so that they can obtain such a large record batch.

In the latest version of the message format, records are always grouped for efficiency. In previous versions of the message format, uncompressed records were not grouped, and in this case, this restriction applies only to individual records.

You can use the topic-level "max.message.bytes" configuration to set for each topic.

2 、 replica.fetch.min.bytes 、 replica.fetch.max.bytes 、 replica.fetch.response.max.bytes

If the partition of kafka is multiple replicas, then follower replicas will continue to pull messages from leader replicas for replication. There are also relevant parameters to set the message size, where replica.fetch.max.bytes limits the size of messages in pull partitions. In versions prior to 0.8.2, if replica.fetch.max.bytes

< message.max.bytes,就会造成 follower 副本复制不了消息。不过在后面的版本当中,已经对这个问题进行了修复。 replica.fetch.max.bytes 参见 2.2.x 版本的官方解释: The number of bytes of messages to attempt to fetch for each partition. This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made. The maximum record batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (topic config). 翻译如下: 尝试为每个分区获取的消息的字节数。 这不是绝对最大值,如果获取的第一个非空分区中的第一个记录批处理大于此值,那么仍将返回记录批处理以确保进度。 代理接受的最大记录批处理大小是通过 message.max.bytes(代理配置)或 max.message.bytes(主题配置)定义的。 replica.fetch.min.bytes、replica.fetch.response.max.bytes 同理。 topic 1、max.message.bytes 该参数跟 message.max.bytes 参数的作用是一样的,只不过 max.message.bytes 是作用于某个 topic,而 message.max.bytes 是作用于全局。 producer 1、max.request.size 该参数挺有意思的,看了 Kafka 生产端发送相关源码后,发现消息在 append 到 RecordAccumulator 之前,会校验该消息是否大于 max.request.size,具体逻辑如下: org.apache.kafka.clients.producer.KafkaProducer#ensureValidRecordSize 从以上源码得出结论,Kafka 会首先判断本次消息大小是否大于 maxRequestSize,如果本次消息大小 maxRequestSize,则直接抛出异常,不会继续执行追加消息到 batch。 并且还会在 Sender 线程发送数据到 broker 之前,会使用 max.request.size 限制发送请求数据的大小: org.apache.kafka.clients.producer.internals.Sender#sendProducerData 也就是说,max.request.size 参数具备两个特性: 1)限制单条消息大小 2)限制发送请求大小 参见 2.2.x 版本的官方解释: The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Note that the server has its own cap on record batch size which may be different from this. 翻译如下: 请求的最大大小(以字节为单位)。 此设置将限制生产者将在单个请求中发送的记录批数,以避免发送大量请求。 这实际上也是最大记录批次大小的上限。 请注意,服务器对记录批大小有自己的上限,该上限可能与此不同。 2、batch.size batch.size 是 Kafka producer 非常重要的参数,它的值对 Producer 的吞吐量有着非常大的影响,因为我们知道,收集到一批消息再发送到 broker,比每条消息都请求一次 broker,性能会有显著的提高,但 batch.size 设置得非常大又会给机器内存带来极大的压力,因此需要在项目中合理地增减 batch.size 值,才能提高 producer 的吞吐量。 org.apache.kafka.clients.producer.internals.RecordAccumulator#append 以上,将消息追加到消息缓冲区时,会尝试追加到一个 ProducerBatch,如果 ProducerBatch 满了,就去缓存区申请 batch.size 大小的缓存创建一个新的 ProducerBatch 继续追加消息。需要注意的是,如果消息大小本身就比 batch.size 大,这种情况每个 ProducerBatch 只会包含一条消息。 最终 RecordAccumulator 缓存区看起来是这样的: 参见 2.2.x 版本的官方解释: The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. No attempt will be made to batch records larger than this size. Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records. 翻译如下: 每当将多个记录发送到同一分区时,生产者将尝试将记录一起批处理成更少的请求。 这有助于提高客户端和服务器的性能。 此配置控制默认的批处理大小(以字节为单位)。 不会尝试批处理大于此大小的记录。 发送给代理的请求将包含多个批次,每个分区一个,并包含可发送的数据。 较小的批处理量将使批处理变得不那么普遍,并且可能会降低吞吐量(零的批处理量将完全禁用批处理)。 非常大的批处理大小可能会浪费一些内存,因为我们总是在预期其他记录时分配指定批处理大小的缓冲区。 那么针对 max.request.size 、batch.size 之间大小的调优就尤其重要,通常来说,max.request.size 大于 batch.size,这样每次发送消息通常会包含多个 ProducerBatch。 consumer 1、fetch.min.bytes、fetch.max.bytes、max.partition.fetch.bytes 1)fetch.max.bytes 参见 2.2.x 版本的官方解释: The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress. As such, this is not a absolute maximum. The maximum record batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (topic config). Note that the consumer performs multiple fetches in parallel. 翻译如下: 服务器为获取请求应返回的最大数据量。 使用者将批量获取记录,并且如果获取的第一个非空分区中的第一个记录批次大于此值,则仍将返回记录批次以确保使用者可以取得进展。 因此,这不是绝对最大值。 代理可接受的最大记录批处理大小是通过" message.max.bytes"(代理配置)或" max.message.bytes"(主题配置)定义的。 请注意,使用者并行执行多个提取。 fetch.min.bytes、max.partition.fetch.bytes 同理。 实战测试 针对以上相关参数配置的解读,还需要对 max.request.size、batch.size、message.max.bytes(或者 max.message.bytes)三个参数进一步验证。 1、测试消息大于 max.request.size 是否会被拦截 设置: max.request.size = 1000,record-size = 2000 使用 kafka-producer-perf-test.sh 脚本测试: $ {kafka_path}/bin/kafka-producer-perf-test.sh --topic test-topic2 --num-records 500000000000 --record-size 20000 --throughput 1 --producer-props bootstrap.servers=localhost:9092,localhost:9093,localhost:9094 acks=-1 max.request.size=1000 测试结果: 可以得出结论,成功拦截了大于 max.request.size 的消息。 2、测试 max.message.bytes 参数用于校验批次大小还是校验消息大小 设置: record-size = 500batch.size = 2000linger.ms = 1000max.message.bytes = 1000 // 在控制台调整主题级别配置即可 使用 kafka-producer-perf-test.sh 脚本测试: $ {kafka_path}/bin/kafka-producer-perf-test.sh --topic test-topic1 --num-records 500000000000 --record-size 500 --throughput 5 --producer-props bootstrap.servers=localhost:9092,localhost:9093,localhost:9094 acks=-1 batch.size=2000 linger.ms=1000 测试结果: 当 max.message.bytes = 2500 时: 可以得出结论,max.message.bytes 参数校验的是批次大小,而不是消息大小。 3、测试消息大小比 batch.size 还大的情况下,是否还会发送消息,当 max.message.bytes 参数小于消息大小时,是否会报错 record-size = 1000batch.size = 500linger.ms = 1000 使用 kafka-producer-perf-test.sh 脚本测试: $ {kafka_path}/bin/kafka-producer-perf-test.sh --topic test-topic1 --num-records 500000000000 --record-size 1000 --throughput 5 --producer-props bootstrap.servers=localhost:9092,localhost:9093,localhost:9094 acks=-1 batch.size=500 linger.ms=1000 测试结果:

It can be concluded that even if the message size is larger than batch.size, the message will continue to be sent.

When max.message.bytes = 900,

It can be concluded that even if batch.size < max.message.bytes, the message will still be sent because the message size is larger than batch.size, and an error may be reported if there is no max.request.size parameter to control the message size.

This also explains why max.request.size and max.message.bytes can be modified directly at the beginning of the article, without the need to adjust batch.size.

The above is how to thoroughly understand the rules of setting parameters related to the size of Kafka messages. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report