How to talk about distributed message Technology Kafka 04/21 Update SLTechnology News&Howtos

How to talk about distributed message Technology Kafka

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to talk about distributed messaging technology Kafka, the content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

A basic introduction to Kafka

Kafka, originally developed by Linkedin, is a distributed, partitioned, multi-replica, multi-subscriber, distributed log system based on zookeeper coordination (which can also be used as a MQ system), which can be used for web/nginx logs, access logs, message services, etc., Linkedin contributed to the Apache Foundation in 2010 and became a top open source project.

The main application scenarios are: log collection system and message system.

The main design objectives of Kafka are as follows:

The ability of message persistence is provided in the way of time complexity O (1), which can guarantee the access performance of constant time even for data above TB level.

High throughput. Even on very cheap commercial machines, it is possible to support the transmission of 100K messages per second on a single machine.

Support message partitioning between Kafka Server, and distributed consumption, while ensuring the sequential transmission of messages within each partition.

Both offline data processing and real-time data processing are supported.

Analysis of Design principle of Kafka

A typical kafka cluster contains several producer, several broker, several consumer, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leader, and rebalance when the consumer group changes. Producer publishes messages to broker,consumer using the push pattern subscribes and consumes messages from broker using the pull pattern.

Kafka terminology:

Broker: message middleware processing node. A Kafka node is a broker, and multiple broker can form a Kafka cluster.

Topic: a class of messages in which a Kafka cluster can be responsible for the distribution of multiple topic at the same time.

Partition:topic physical packets, a topic can be divided into multiple partition, each partition is an ordered queue.

Segment:partition is physically composed of multiple segment.

Offset: each partition consists of a series of ordered, immutable messages that are continuously appended to the partition. Each message in partition has a contiguous sequence number called offset, which is used by partition to uniquely identify a message.

Producer: responsible for publishing messages to Kafka broker.

Consumer: the message consumer, the client that reads the message to Kafka broker.

Consumer Group: each Consumer belongs to a specific Consumer Group.

Transaction characteristics of Kafka data Transmission

At most once: at most, this is similar to the "non-persistent" message in JMS. Once sent, regardless of success or failure, it will not be retransmitted. The consumer fetch the message, then saves the offset, and then processes the message; when client saves the offset, but an exception occurs during the message processing, some of the messages cannot continue to be processed. Then the "outstanding" message will not be fetch to, this is "at most once".

At least once: the message is sent at least once, and if the message is not accepted successfully, it may be resent until it is successfully received. The consumer fetch the message, then processes the message, and then saves the offset. If the message is processed successfully, but during the save offset phase, the zookeeper exception causes the save operation not to be performed successfully, which results in the next time you fetch, you may get the message that has been processed last time, which is "at least once". The reason is that offset is not submitted to zookeeper,zookeeper in time to return to normal or previous offset state.

Exactly once: messages are sent only once. There is no strict implementation in kafka (based on 2-phase commit), and we don't think this strategy is necessary in kafka.

Usually "at-least-once" is our first choice.

Kafka message storage format

Topic & Partition

A topic can be thought of as a class of messages, each topic will be divided into multiple partition, and each partition is an append log file at the storage level.

In Kafka file storage, there are several different partition under the same topic, each partition is a directory, the partiton naming rule is topic name + ordinal number, the first partiton serial number starts at 0, and the maximum serial number is the number of partitions minus 1.

Each partion (directory) is equivalent to a giant file that is evenly distributed among multiple equal-size segment (segment) data files. However, the number of segment file messages per segment is not necessarily equal, which makes it convenient for old segment file to be deleted quickly.

Each partiton only needs to support sequential read and write, and the life cycle of segment files is determined by server configuration parameters.

The advantage of this is that useless files can be deleted quickly and disk utilization can be effectively improved.

Segment file consists of two parts, index file and data file, which correspond to each other and appear in pairs. The suffixes ".index" and ".log" are represented as segment index files and data files respectively.

Segment file naming convention: the first segment of the partion global starts at 0, and each subsequent segment file is named as the offset value of the last message in the previous segment file. The maximum numeric value is 64-bit long size, 19-digit character length, and no numbers are filled with 0.

The physical structure of the correspondence between index and data file in segment is as follows:

In the above figure, the index file stores a large amount of metadata, and the data file stores a large number of messages. The metadata in the index file points to the physical offset address of the message in the corresponding data file.

Taking the metadata 3497 in the index file as an example, the third message is represented in the data file (368772 message in the global partiton), and the physical offset address of the message is 497.

Knowing that segment data file consists of many message, the physical structure of message is described in detail below:

Parameter description:

The keyword explains that 8 byte offset has an ordered id number for each message within the parition (partition). This id number is called offset, which uniquely determines the location of each message within the parition (partition). That is, offset indicates the number of partiion message4 byte message sizemessage size 4 byte CRC32 uses crc32 check message1 byte "magic" to indicate that this release Kafka service program protocol version number 1 byte "attributes" is expressed as an independent version, or identifies the compression type, or the coding type. 4 byte key length indicates the length of key. When key is-1, the K byte key field does not enter K byte key. Optional value bytes payload indicates the actual message data.

Replica (replication) policy

The guarantee of Kafka's high reliability comes from its robust replica (replication) strategy.

1) data synchronization

Kafka did not provide the Replication mechanism of Partition before version 0.8. once Broker goes down, all Partition on it cannot provide services, and Partition does not back up data, so the availability of data is greatly reduced. Therefore, the Replication mechanism is provided after 0.8 to ensure the failover of Broker.

After the introduction of Replication, there may be multiple Replica in the same Partition, and you need to choose between these Replication that one Leader,Producer and Consumer only interact with this Leader, and the other Replica replicates data from the Leader as a Follower.

2) copy placement strategy

In order to do better load balancing, Kafka tries to distribute all Partition evenly to the whole cluster.

The algorithm for allocating Replica by Kafka is as follows:

Sort all N surviving Brokers and Partition to be allocated

Assign the first Partition to the (i mod n) Broker, and the first Replica of the Partition exists on the assigned Broker and will be the preferred copy of the partition.

Assign the j th Replica of the I th Partition to the ((I + j) mod n) Broker

Suppose the cluster has a total of four brokers, one topic has four partition, and each Partition has three copies. The following figure shows the replica allocation on each Broker.

3) synchronization strategy

When Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then no matter what the Replication Factor of the Topic is, Producer only sends the message to the Leader of the Partition. Leader writes the message to its local Log. Each Follower has data from Leader pull. In this way, the order of data stored by Follower is the same as that of Leader. After Follower receives the message and writes its Log, it sends the ACK to Leader. Once the Leader receives the ACK of all the Replica in the ISR, the message is considered to be commit, and the Leader will increase the HW and send the ACK to the Producer.

To improve performance, each Follower sends an ACK to the Leader as soon as it receives the data, rather than waiting for the data to be written to the Log. Therefore, for messages that have been commit, Kafka can only guarantee that they are stored in the memory of multiple Replica, but not that they are persisted to disk, so there is no complete guarantee that the message will be consumed by Consumer after an exception occurs.

Consumer read messages are also read from Leader, and only messages that have been commit are exposed to Consumer.

The data flow of Kafka Replication is shown in the following figure:

For Kafka, defining whether a Broker is "alive" consists of two conditions:

One is that it must maintain session with ZooKeeper (this is implemented through ZooKeeper's Heartbeat mechanism).

Second, Follower must be able to copy Leader messages in time and not be "too far behind".

Leader tracks a list of Replica that is synchronized with it, which is called ISR (that is, in-sync Replica). If a Follower goes down or lags too far behind, Leader removes it from the ISR. The term "too far behind" described here means that the number of messages copied by Follower exceeds the predetermined value after lagging behind Leader, or that Follower fails to send fetch requests to Leader after a certain period of time.

Kafka only deals with fail/recover, and a message is considered submitted only if all Follower in the ISR are copied from the Leader. This prevents some of the data from being written into Leader and downtime before it is replicated by any Follower, resulting in data loss (Consumer cannot consume the data). In the case of Producer, it can choose whether to wait for the message commit. This mechanism ensures that as long as the ISR has one or more Follower, a message that is commit will not be lost.

4) leader election

Leader election is essentially a distributed lock, and there are two ways to implement ZooKeeper-based distributed locks:

Uniqueness of node name: multiple clients create a node, and only the client that successfully creates the node can acquire the lock

Temporary order nodes: all clients create their own temporary sequence nodes in a directory, and only the one with the lowest sequence number gets the lock

The election strategy of Majority Vote is similar to that of Zab in ZooKeeper. In fact, the election strategy of minority subordinate to majority has been realized in ZooKeeper itself. Kafka adopts the first method for the election of leader copies of Partition: assign a copy to Partition and specify a temporary ZNode node. The first copy successfully created is the Leader node, and the other copies will register Watcher listeners on this ZNode node. Once the Leader is down, the corresponding temporary node will be automatically deleted, and all Follower registered on this node will receive a listener event and they will try to create the node. Only the follower that has been successfully created will become a Leader (ZooKeeper guarantees that only one client can be created successfully for a node), and other follower will continue to re-register listening events.

Kafka message grouping, message consumption principle

A message of the same Topic can only be consumed by one Consumer within the same Consumer Group, but multiple Consumer Group can consume the message at the same time.

This is the means used by Kafka to broadcast (to all Consumer) and unicast (to a Consumer) a Topic message. A Topic can correspond to multiple Consumer Group. If you need to implement broadcasting, as long as each Consumer has a separate Group. To achieve unicast, all Consumer are in the same Group. With Consumer Group, you can also group Consumer freely without having to send messages to different Topic multiple times.

Push vs. Pull

As a messaging system, Kafka follows the traditional approach of choosing messages from Producer to broker push and from Consumer to broker pull.

The push model is difficult to adapt to consumers with different consumption rates, because the message delivery rate is determined by broker. The goal of the push pattern is to deliver messages as quickly as possible, but it is easy to cause Consumer to be too late to process messages, typically characterized by denial of service and network congestion. On the other hand, pull mode can consume messages at an appropriate rate according to the consumption power of Consumer.

For Kafka, the pull mode is more appropriate. Pull mode can simplify the design of broker, and Consumer can independently control the rate of consuming messages. At the same time, Consumer can control its own consumption mode, that is, it can consume in batches or one by one, and it can choose different submission methods to achieve different transmission semantics.

Kafak sequential writing and data reading

The producer (producer) is responsible for submitting data to Kafka. Kafka will write all received messages to the hard disk, and it will never lose data. In order to optimize the write speed, Kafak uses two techniques, sequential write and MMFile.

Sequential write

Because the hard disk is a mechanical structure, each read and write will address, write, in which addressing is a "mechanical action", it is the most time-consuming. So the hard disk most "hates" random ICQO, and likes sequential ICQO most. In order to improve the speed of reading and writing to the hard disk, Kafka uses sequential I _ hand O.

Each message is append into the Partition, which belongs to the sequential write disk, so it is very efficient.

For traditional message queue, messages that have been consumed will generally be deleted, while Kafka will not delete data, it will retain all the data, and each Consumer has an offset for each Topic to indicate which items of data have been read.

Even if it is written sequentially to the hard disk, the access speed of the hard disk cannot catch up with the memory. Therefore, the data of Kafka is not written to the hard disk in real time, it makes full use of the paging storage of modern operating system to improve the efficiency of IWeiO by using memory.

After Linux Kernal 2.2, there is a system call mechanism called "zero copy (zero-copy)", which skips the copy of "user buffer" and establishes a direct mapping between disk space and memory space. The data is no longer copied to the "user state buffer". The system context switch is reduced by 2 times, which can double the performance.

With mmap, the process reads and writes memory (virtual machine memory, of course) like reading and writing to a hard disk. In this way, you can get a big boost in I _ peg O, saving the overhead of copying from user space to kernel space (the read of the calling file will first put the data into the memory of kernel space and then copy it to the memory of user space. )

Consumer (read data)

Just imagine, a Web Server transfers a static file, how to optimize it? The answer is zero copy. In traditional mode, we read a file from the hard disk like this.

First copy to kernel space (read is a system call, put into DMA, so use kernel space), then copy to user space (1, 2); copy again from user space to kernel space (your socket is a system call, so it also has its own kernel space), and finally send it to the network card (3, 4).

Zero Copy goes directly from kernel space (DMA) to kernel space (Socket), and then sends the network card. This technology is very common, and it is also used by Nginx.

In fact, Kafka stores all messages in a file, and Kafka sends "files" directly to consumers when they need data. When you don't need to send the entire file, Kafka calls Zero Copy's sendfile function, which includes:

Out_fd as output (handle to general timely socket)

In_fd as input file handle

Off_t represents the offset of the in_fd (where to start reading)

Size_t indicates how many reads

On how to talk about distributed messaging technology Kafka to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.