The way of further study of kafka (1)-Analysis of each principle 01 07/09 Update SLTechnology News&Howtos

The way of further study of kafka (1)-Analysis of each principle 01

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The way of further study of kafka (1)-Analysis of each principle 01

Introduction: came to the new company, need to have a deep research on kafka components, I have done some research on the old version of kafka, but not in-depth, the new company promotes kafka, compare kafka as a message system in the current market share is still very high, you can see my previous kafka blog about the advantages of kafka and why to use kafka.

Among the many advantages, I think the two most important advantages are as follows:

1. Peak clipping

The processing capacity of the database is limited, in the peak period, too many requests fall into the background, once the processing capacity of the system is exceeded, the system may fail.

As shown in the figure above, the processing capacity of the system is 2kUniverse MQ is 8k/s, and the processing capacity of peak request 5kUniverse MQ is much larger than that of the database. During the peak period, requests can be overstocked in MQ first, and the system can consume these requests at the speed of 2k/s according to its own processing capacity.

As soon as the peak is over, the request may only be 100 seconds, and the system can quickly consume the backlog of requests in the MQ.

Note that the above request refers to a write request, and query requests are generally resolved through caching.

2. Decoupling

In the following scenario, the S system is closely coupled with the A, B, C systems. Due to the change of demand, system A modifies the relevant code, and system S also needs to adjust the code related to A.

In a few days, C system needs to be deleted, S immediately removes C-related code; after a few days, D system needs to be added, S system needs to add D-related code; in a few days, programmers are crazy.

In this way, each system is closely coupled, which is not conducive to maintenance and expansion. Now the introduction of MQ,A system changes, A can modify their own code; C system delete, directly unsubscribe; D system add, subscribe to related messages.

In this way, through the introduction of message middleware, each system can interact with MQ, thus avoiding the complicated calling relationship between them.

The principle of kafka architecture:

The most classic picture is the official picture.

Found some pictures of other bloggers: I am lazy to draw here.

Detailed and complex kafka architecture

Popular point: producer-> kafka cluster (brokers)-> consumer

Producer production messages are consumed by consumers through kafka queues

For related component concepts, see:

Topic and logs

Don't talk much nonsense, see the picture first.

The text is explained as follows:

Message is organized according to Topic, and each Topic can be divided into multiple Partition (for server.properties/num.partitions). My habitual profile is the number of num.partitions=broker, which is artificially assigned to each node.

Each record (Message) in Partition contains three attributes: Offset,messageSize and Data.

Where Offset represents the message offset, messageSize represents the size of the message, and Data represents the specific content of the message.

Partition is stored in the file system as a file, the location is specified by server.properties/log.dirs, and its naming rule is -.

The production profile is: log.dirs=/data/kafka/kafka-logs

[hadoop@kafka03-55-13 kafka-logs] $pwd/data/kafka/kafka-logs [hadoop@kafka03-55-13 kafka-logs] $ls | grep mjhtopic-by-mjh-0topic-by-mjh-1topic-by-mjh-10topic-by-mjh-11topic-by-mjh-12.

Partition may be on a different Broker, Partition is segmented, and each segment is a Segment file.

Data files and index files are included in the Partition directory

[hadoop@kafka03-55-13 kafka-logs] $cd topic-by-mjh-0 [hadoop@kafka03-55-13 topic-by-mjh-0] $lltotal 4Mir RWMQ-1 hadoop hadoop 10485760 Aug 24 20:13 0000000000000334.indexMet RWKui-1 hadoop hadoop 0 Aug 13 17:42 00000000000334.Log Mustang RWMuir-1 hadoop hadoop 10485756 Aug 24 20:13 00000000000000000334.indetimex Aug RWLY RWLY-1 hadoop hadoop 4 Aug 16 14:16 leader-epoch-checkpoint

Index uses sparse storage, it does not build an index for every Message, but an index every certain number of bytes to avoid index files taking up too much space.

The disadvantage is that the Offset without indexing can not locate the location of the Message at once and needs to do a sequential scan, but the scope of the scan is very small.

The index contains two parts (both 4-byte numbers), which are relative Offset and Position, respectively.

The relative Offset represents the Offset,Position in the Segment file indicates the location of the Message in the data file.

The log file under Segment is where messages are stored.

Each message contains the message body, offset, timestamp, key, size, compression encoder, checksum, message version number, and so on.

The format of the data on disk is exactly the same as the data sent by producer to broker and the same format as the data received by consumer. Because the disk format is exactly the same as the data format of consumer and producer, Kafka can improve the transmission efficiency through zero copy (zero-copy) technology. / / about zero-copy technology, I will write a blog to explain it later.

Summary:

1. Partition is a sequential append log, which belongs to sequential write disk (sequential write disk is more efficient than random write memory, which ensures Kafka throughput).

2. Kafka Message storage uses Partition, disk sequential read and write, LogSegment and sparse index to achieve high efficiency.

3. In Kafka file storage, there are many different Partition under the same Topic, each Partition is a directory, and each directory is evenly distributed into multiple equal size Segment File (Segment size we set to 1G or 500MB in production), Segment File is composed of index file and data file, they always appear in pairs, and the suffixes ".index" and ".log" sub-table represent Segment index files and data files.

Partition and Replica

A Topic is physically divided into multiple Partition, located on different Broker. If there is no Replica, once Broker goes down, all Patition on it will not be available.

Each Partition can have multiple Replica (corresponding to server.properties/default.replication.factor), assigned to a different Broker. My default habit is default.replication.factor=2, that is, 2 copies by default, which is more reasonable.

One of these Leader is responsible for reading and writing, processing requests from Producer and Consumer; the others, as Follower from Leader Pull messages, keep synchronized with Leader.

How do I allocate Partition and Replica to Broker? The steps are as follows:

1. Sort all Broker (assuming a total of n Broker) and the Partition to be assigned.

2. Assign the I Partition to the (i mod n) Broker.

3. Assign the j Partition of the I th Broker to the ((I + j) mode n) Broker.

According to the above allocation rules, if the number of Replica is greater than the number of Broker, there must be two identical Replica assigned to the same Broker, resulting in redundancy. Therefore, the number of Replica should be less than or equal to the number of Broker.

/ / here kafka rigidly stipulates that the number of replica created cannot exceed the number of broker, which must be equal to less than the number of broker

Here are two algorithm functions to explain.

1. Mod: remainder function

2. Mode: returns the value that occurs most frequently in an array or data region. Mode is a position measurement function.

I only have 3 broker to create 4 replica, and there is an error report. See below [root@kafka02-55-12] # kafka-topics.sh-- zookeeper 10.211.55.11 root@kafka02 2181 Magazine 10.211.55.12 Vol 2181 Magazine 10.211.55.13 lance 2181 Legend Kafkaghi-- replication-factor 4-- partitions 9-- create-- topic topic-zhuhair**Error while executing topic command: Replication factor: 4 larger than available brokers: 3.times * [2019-08-24 20:41 : 40611] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 4 larger than available brokers: 3.

How is pratition's leader elected-broker failover failover

/ / popular point, that is, when the broker goes down, how to ensure high availability

The text description is as follows:

Kafka dynamically maintains an ISR (in-sync replicas) in Zookeeper (/ brokers/topics/ / partitions/ / state).

All the Replica in the ISR are "up". Leader,Controller will choose one of the ISR to be the Leader.

The specific process is described as follows:

1. Controller registers Watcher at the / brokers/ids/ [brokerId] node of Zookeeper

2. Zookeeper will Fire Watch when Broker goes down.

3. Controller reads the available Broker from the / brokers/ids node.

4. Controller determines the set_p, which contains all the Partition on the down Broker.

5. For each Partition in set_p, read the ISR from the / brokers/topics/ [topic] / partitions/ [partition] / state node, decide on the new Leader, and write the new Leader, ISR, controller_epoch and leader_epoch information to the State node.

6. Zk sends leaderAndISRRequest commands to the relevant Broker through RPC.

In extreme cases, you need to consider:

When ISR is empty, a Replica (not necessarily an ISR member) is selected as the Leader

When all the Replica stops, it will wait for any Replica to come back to life and use it as a Leader.

/ /

This requires a simple tradeoff between usability and consistency. If you have to wait for the Replica in ISR to come alive, it may be unavailable for a relatively long time. And if all the Replica in the ISR cannot be "alive", or if all the data is lost, the Partition will never be available. Select the first "alive" Replica as the Leader, and this Replica is not the Replica in ISR, then even if it does not guarantee that it contains all the messages that have already been commit, it will become the Leader and serve as the data source of the consumer (as mentioned earlier, all reads and writes are done by Leader). Kafka0.8.* uses the second approach. According to Kafka documentation, in future releases, Kafka allows users to choose one of these two ways through configuration, thus choosing high availability or strong consistency according to different usage scenarios.

The Follower in the ISR (synchronization list) is "keeping up" with Leader. "keeping up" does not mean it is exactly the same. It is configured by server.properties/replica.lag.time.max.ms. Represents the maximum time that Leader waits for Follower synchronization messages. If it times out, Leader removes the ISR from Follower. The configuration item replica.lag.max.messages has been removed.

How Replica replicas synchronize message passing synchronization policies

1. When Producer posts a message to a Partition, it first finds the Leader of the Partition through ZooKeeper.

2. Regardless of the Replication Factor of the Topic, Producer only sends the message to the Leader of the Partition.

3. Leader writes the message to its local Log.

4. Each Follower has data from Leader pull. In this way, the order of data stored by Follower is the same as that of Leader.

5. After receiving the message and writing its Log, Follower sends ACK to Leader.

6. Once the Leader receives the ACK of all the Replica in the ISR, the message is considered to be commit, and the Leader will increase the HW and send the ACK to the Producer.

To improve performance, each Follower sends an ACK to the Leader as soon as it receives the data, rather than waiting for the data to be written to the Log.

Therefore, for messages that have been commit, Kafka can only guarantee that they are stored in the memory of multiple Replica, but not that they are persisted to disk, so there is no complete guarantee that the message will be consumed by Consumer after an exception occurs.

Consumer read messages are also read from Leader, and only messages that have been commit are exposed to Consumer.

The specific reliability is determined by the producer (according to the configuration item producer.properties/acks).

It is said that the latest document 2.2.x request.required.acks no longer exists, which needs me to confirm.

Popularly speaking, the meaning of the three parameters of ack is

Kafka producer has three ack mechanisms to configure in config when initializing producer

0: it means that producer does not wait for confirmation of the completion of broker synchronization, but continues to send the next (batch) message.

Provides the lowest latency. But with the weakest persistence, data loss is likely to occur when the server fails. For example, if leader is dead and producer does not know it, it will continue to send messages. If broker does not receive the data, the data will be lost.

1: this means that producer waits for leader to successfully receive the data and get confirmation before sending the next message. This option provides better persistence and lower latency. Partition's Leader dies, and before follwer is replicated, data will be lost

-1: it means that producer is confirmed by follwer before sending the next piece of data

The durability is the best and the delay is the worst.

The point emphasized here is that replication in fllower and leader in kafak's partition is not fully synchronous replication, nor is it purely asynchronous replication

Synchronous replication: the disadvantage of committing all fllower after replication greatly affects throughput.

Asynchronous replication: Follower replicates data asynchronously from Leader. Data is considered to be commit as long as it is written by Leader to log. In this case, Follower lags behind Leader after replication, and if Leader suddenly goes down, the data will be lost.

The ISR approach adopted by all kafak is a good balance to ensure that there is no data loss and throughput. Follower can copy data from Leader in batches, which greatly improves replication performance (bulk writing to disk) and greatly reduces the gap between Follower and Leader.

How producer sends messages

Producer first encapsulates the message in a ProducerRecord instance.

Write the route pattern of the message

1. If patition is specified, it is directly used; 2. If patition is not specified, key is specified. Select a patition by hash the value of key. This Hash (that is, the partition mechanism) is implemented by the class specified by producer.properties/partitioner.class. This routing class needs to implement the Partitioner interface. 3. Patition and key are not specified, and a patition is selected by polling. Note: the message is not sent immediately, but is serialized and sent to Partitioner, that is, the Hash function mentioned above. After the target partition is determined by Partitioner, it is sent to a memory buffer (send queue). Another worker thread of Producer, that is, the Sender thread, is responsible for extracting the prepared message from the buffer in real time and encapsulating it into a batch and sending it to the corresponding Broker.

The specific data flow is as follows:

The specific process is as follows:

1. Producer first finds the leader of the partition from the "/ brokers/.../state" node of the zookeeper

2. Producer sends the message to the leader

3. Leader writes the message to the local log

4. Followers from leader pull message, leader sends ACK after it is written to local log

5. After leader receives the ACK of replica in all ISR, add HW (high watermark, the last offset of commit) and send ACK to producer

Referring to the materials on the Internet, some text versions of the explanations are like this. I think the following text explanations are more easy to understand.

The process is as follows:

1. First, we need to create a ProducerRecord that contains the subject (topic) and value (value) of the message, optionally specifying a key (key) or partition (partition).

2. When sending a message, the producer serializes the key values and values into a byte array and then sends them to the allocator (partitioner).

3. If we specify a partition, the allocator can return the partition; otherwise, the allocator will select a partition based on the key value and return it.

4. After selecting the partition, the producer knows the topic and partition to which the message belongs. It adds this record to the batch messages of the same topic and partition, and another thread is responsible for sending these batch messages to the corresponding Kafka broker.

5. When broker receives the message, it returns a RecordMetadata object containing the topic, partition and displacement of the message if it is written successfully, otherwise an exception is returned.

6. After the producer receives the result, the exception may be retried.

Reference link:

The road to architecture growth: Kafka design principles read and forget, forget and see? One article for you to master: https://www.toutiao.com/i6714606866355192328/

There are three kinds of ACK mechanisms of Kafka, which are they: https://blog.csdn.net/Sun1181342029/article/details/87806207

Kafka principles Series ACK mechanisms (data Reliability and persistence Assurance) https://blog.csdn.net/bluehawksk/article/details/96120803

Getting started with kafka introduction to https://www.orchome.com/5

The Road to Kafka Learning (3) High availability https://www.cnblogs.com/qingyunzong/p/9004703.html of Kafka

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.