Some problems of Kafka 0.10 07/01 Update SLTechnology News&Howtos

Some problems of Kafka 0.10

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

15. How to consume internal topics: __consumer_offsets

The main thing is to format it: GroupMetadataManager.OffsetsMessageFormatter

Finally, I used to look at its source code, pick out this part, and parse the byte[] obtained by myself. The core code is as follows:

// com.sina.mis.app.ConsumerInnerTopic ConsumerRecords records = consumer.poll(512); for (ConsumerRecord record : records) { Object offsetKey = GroupMetadataManager.readMessageKey(ByteBuffer.wrap(record.key())); if (offsetKey instanceof OffsetKey) { GroupTopicPartition groupTopicPartition = ((OffsetKey) offsetKey).key(); OffsetAndMetadata value = GroupMetadataManager.readOffsetMessageValue(ByteBuffer.wrap(record.value())); LOG.info(groupTopicPartition.toString() + "---:---" + value); } else { LOG.info("############:{}", offsetKey); } }1.For Kafka 0.8.2.x#Create consumer configecho "exclude.internal.topics=false" > /tmp/consumer.config#Only consume the latest consumer offsets./ kafka-console-consumer.sh --consumer.config /tmp/consumer.config \--formatter "kafka.server.OffsetManager\$OffsetsMessageFormatter" \--zookeeper localhost:2181 --topic __consumer_offsets2.For Kafka 0.9.x.x and 0.10.0.0#Create consumer configecho "exclude.internal.topics=false" > /tmp/consumer.config#Only consume the latest consumer offsets./ kafka-console-consumer.sh --consumer.config /tmp/consumer.config \--formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter" \--zookeeper 10.39.40.98:2181/kafka10 --topic __consumer_offsets

Committing and fetching consumer offsets in Kafka : Get Consumer Offset using Java API.

Since version 0.8.2, kafka has had the ability to store consumer offsets in an internal compacted topic called __consumer_offsets

14.Kafka metrics

Kafka uses Yammer Metrics to store Server and Client data. This data can be retrieved plug-in style and written to CSV files.

Kafka implements KafkaCSVMetrics Reporter.scala to write metrics to CSV files.

Since there is no implementation class written to ganglia, metrics cannot be written directly from Kafka to ganglia.

document

13. Snappy's compression ratio

Why is the HDFS data of a certain topic about 40% more than Kafka's own statistics?

Sina's KafkaProxy uses snappy compression to enter kafka.

Guess 30%-40%

Need to test it: find a batch of HDFS files, write them to Kafka, consume them, write them as files, and see the size difference.

12. When Kafka's Consumer reads data, which partition is read

The API for High level Consumer is assigned by default to Range, and another is RoundRobin.

11. When Kafka's Producer sends data, to which partition is it sent?

This is determined by DefaultPartitioner.

If a partition is specified in the record, use it.

If no partition is specified but a key is present choose a partition based on a hash of the key

If no partition or key is present choose a partition in a round-robin fashion

Chinese:

Hash key.

Round-robin without key

Version 0.8.0 is Random when there is no key.

10. Linkedin Cluster GC

90% of broker GC pauses are around 21 ms. Young GC less than 1 per second

9. Explain what ZoroCopy (sendfile technology) is.

Traditional network IO process, one-time transmission process:

Read Buffer reads data from Disk to kernel.

Transfer data from kernel to user buffer.

Then write the data to Socket Buffer in kernel area.

Copy data from Socket Buffer to NIC Buffer of NIC.

Kafka lacks two middle steps, which is sendfile technology:

8. How kafka achieves large throughput, powerful message stacking capabilities, etc.

Depending on the OS file system page cache (when there is a write operation at the upper level, the operating system simply writes the data to PageCache and marks the Page attribute as Dirty. When a read operation occurs, it is searched from PageCache first. If a page is missing, disk scheduling is performed, and finally the required data is returned. PageCache actually uses as much free memory as possible as a disk cache. At the same time, if there are other processes requesting memory, the cost of reclaiming PageCache is small.

Summary: OS-dependent page cache can greatly reduce IO, efficient use of memory as cache) New Channel IELTS

High memory utilization without using JVM cache data

Sequential IO and O(1) constant time get, put messages

sendfile technology (zero copy)

7. The most important thing about a queue is the message loss problem, and kafka is how to handle it.

Every time data is sent, Producer thinks it has been sent after send(), but in most cases the message is still in the MessageSet in memory and has not been sent to the network. At this time, if Producer hangs, data will be lost.

Solution: ack mechanism, generally set to acks=1, the message only needs to be accepted and confirmed by the Leader, thus ensuring reliability and efficiency at the same time.

6. Kafka 0.10's Producer has been optimized

MessageSet means batch sequential writing

Data support compression

sent asynchronously

5. Why Kafka is a Pull Model

The goal of push mode is to deliver messages as quickly as possible, but it is easy to cause consumers to have too little time to process messages, typically denial of service and network congestion. Pull mode, on the other hand, consumes messages at an appropriate rate based on the consumer's ability to consume.

4. How are LogSegment and Index stored at the bottom?

log: byte stream, sendfile, zero copy technology

index: sparse index, mmap data structure-essentially a class, binary search to find offset.

3. An important question is how to elect a new Leader among Followers when the Leader goes down.

A very common way to elect a leader is the "Majority Vote."

The topic you just created is usually "preferred replica" is the leader. An ISR (in-sync replicas) is dynamically maintained in ZooKeeper. All replicas in this ISR keep up with the leader. Only members in the ISR can be selected as the Leader.

All partition Leader elections are determined by the controller. The controller will notify the Broker that needs to make this change directly through RPC.

So how does the Controller elect the leader?

If at least one replica of the current ISR survives, select one as the new Leader.

If none of the replicas are in the ISR list, select any surviving Replica from the Partition as the new Leader and ISR (there may be potential data loss in this scenario).

If all replicas of the Partition are down, set the new Leader to-1.

2. What about the Partition leader election?

When Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then writes the data.

Consumer (0.8) Find leader by zk and read data.

Consumer (0.10) Find the Leader through Coordinator and read the data.

1. reassign a topic, Producer, Consumer whether data will be lost

No. When expanding, the new leader needs to copy data from the old broker, and after catching up, it will switch to leader.

During this time, producers and consumers communicate to the old leader.

Internal topic: __consumer_offsets

This topic is used to manage the progress of all consumers, so as to avoid storing the consumption progress on zk and affecting scalability. It is managed by the Coordinator.

If the requested topic is__consumer_offsets, start the asynchronous read of OffsetManager.

This asynchronous reader reads__consumer_offsets all the time and decodes the message into a consumption progress cache

queued.max.requests=16

I/O threads can handle the queue size of requests, and if the actual number of requests exceeds this size, the network thread will stop receiving new requests.

confluence

Maximum number of partitions: 100 * broker * replica (if you care about latency, it's probably a good idea to limit the number of partitions per broker to 100 x b x r, where b is the number of brokers in a Kafka cluster and r is the replication factor.)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.