Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the working principle of kafka

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces how to analyze the working principle of kafka. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

1. About kafka

Kafka is an open source message queue developed by the Apache Software Foundation and written by Scala and Java.

For related articles, refer to:

MQ: simple comparison of common application scenarios of message queuing and mainstream message queues ActiveMQ, RabbitMQ, RocketMQ and Kafka

MQ: kafka Java access and getting started example (topic addition, deletion, modification and query, Producer multi-parameter transmission, Consumer multi-partition acceptance)

two。 working principle

First, let's take a look at the overall data flow architecture of kafka:

2.1. Related terms

In the figure above, the following terms are involved:

Producer: message producer, produces messages, and then push to message queues.

Broker: message middleman, the storage container for messages.

Consumer: message consumers, pull data from message queues for consumption.

Topic: message topic, the division of messages from a business perspective, used to distinguish between different types of messages.

Partition: message partition, logical partition of messages on storage (10 messages, 5 messages stored in partition 1, 5 messages stored in partition 2), used to speed up message consumption and prompt message consumption throughput.

Consumer Group: message consumer groups, which can distinguish between different types of consumers, and are used to implement peer-to-peer and publish-subscribe patterns.

Other terms Key, Leader, Replicas, and ISR are introduced step by step in subsequent chapters.

2.2. Message storage related

If you want to understand how kafka works, you should first master the storage structure of messages.

The logical concept of distinguishing messages is that the topic Topic,Topic is stored in Broker, as shown in the following figure:

To make it easier to understand, let's look directly at the picture and say:

Broker and Topic

The Broker cluster in the figure is actually made up of three Broker, and each Broker is actually a Kafka service node.

There are three Topic in the picture: Topic-0 in orange, Topic-2 in Tiffany blue and Topic-1 in dark blue.

Topic and Partition

Just look at the smallest unit of the message container in Topic-0-Partition-0:Broker that stores strips of messages for Partition,Partition.

Just look at Broker-1:

Topic can have multiple Partition, for example, Topic-1 has 2 partitions; if there are a total of 9 messages, 4 and 5 messages may be stored in 2 partitions respectively.

Topic can have only one Partition, for example, if Topic-2; has a total of 9 messages, then this separate partition stores 9 messages.

Which message is stored in the partition depends on the choice of Producer when sending the message, as described in the following chapters.

Multi-copy redundancy mechanism

Just look at Topic-2:

Bold Partition indicates the Leader (primary) partition, which is responsible for reading and writing messages.

The thin frame Partition represents the Replicas (slave) partition, passively replicates Leader, and replicates redundant disaster recovery.

If the Broker-1 dies, the Leader of the Topic-2-Partition-0 hangs up, and a new Leader is elected from the remaining two Replicas to continue the service.

The number of Replicas cannot exceed the number of Broker, because having more than one Replicas on a Broker has the same effect as having one Replicas.

The number of Replicas can be less than the number of Broker.

Take a look at Topic-0:

Every Partition will have Leader and Replicas.

Kafka will try to break up the partition Leader of the same Topic. As shown in the figure, 3 Leader are distributed in 3 Broker.

The distributed distribution of partition Leader is not absolute. For example, if there is only one Broker at this time, then three partitions are all Leader, all distributed on the same Broker.

The number of partitions can be greater than the number of Broker, because partitions exist to speed up message consumption and have nothing to do with redundant disaster recovery.

Relative order

The messages of the three partitions of Topic-0 are: 1, 2, 3, 4, 5, 6, 7, 8, 9, respectively.

The messages for the two partitions of Topic-1 are: 1, 2, 8, 9, 4, 5, 6, 7.

The messages for 1 partition of Topic-2 are: 1, 2, 3, 4, 5, 6, 7, 8, 9.

Relative ordering: messages within a single partition are ordered, while messages between multiple partition are out of order.

Kafka records the order of messages by marking offset in Partition.

If the business scenario pursues global ordering, only one Partition can be configured for each Topic.

Producer message production semantics

The message is sent at most once: the first way: send the message asynchronously. The second way: send messages synchronously and retry 0 times.

Send the message at least once: send the message synchronously, fail and retry with a timeout until the message is sent successfully.

2.3. Related to message production

After figuring out the storage of the message, let's look at the production of the message:

To make it easier to understand, let's look directly at the picture and say:

①: a Producer can send messages to multiple Topic and multiple Partition.

②: multiple Producer can send messages to the same Topic and the same Partition.

③: message sending parameters: (topic, [partition], [key], message)

Topic is required; message is the message itself, required.

Partition is optional. If it is left empty, determine whether key exists. If key does not exist, then randomly select the partition.

Key is optional. If filled in, the partition is selected according to the result of the number of modular partitions after the key hash; if not, the partition is randomly selected.

Randomly selected partition: give priority to the random partition of the cache; if the cache is empty, randomly select the partition, and then store the random partition in the cache for next use.

2.4. Related to message consumption

Let's continue to learn about the consumption of messages:

To make it easier to understand, let's look directly at the picture and say:

News consumption mode

The arrow direction of message consumption: kafka consumes messages only by pull, not by push.

The advantage of push lies in its high real-time performance, but it is easy to overwhelm Consumer because of Producer production messages too fast.

The advantage of pull is that it can control the speed of consumption, but it is prone to empty rotation training.

Kafka optimizes pull: configure pull only when the data exists and reaches a certain order of magnitude.

Consumer Group and Consumer

⑥⑦: a Consumer-Group can have more than one Consumer or only one Consumer.

⑤: a Topic-Partition message can be consumed by multiple Consumer-Group. Note: it's Consumer-Group, not Consumer.

⑦: if the Consumer-Group has only one Consumer, then all messages in that Partition are consumed by this Consumer.

⑥: if Consumer-Group has more than one Consumer, and during a normal connection:

Messages in a single Partition can only be consumed by one of the Consumer, not by multiple Consumer in the Consumer-Group.

Messages from multiple Partition can be consumed by one Consumer.

If the number of partitions in a single Topic is less than the number of Consumer in the Consumer-Group, there will be a message that the Topic cannot be accepted by Consumer.

Consumer message consumption semantics

The message is consumed at most once: 1. Read the message, 2, confirm offset,3. Processing messages.

The message is consumed at least once: 1. Read the message, 2. Processing messages, 3. Confirm offset.

On the working principle of kafka analysis on how to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report