How to know Kafka 07/06 Update SLTechnology News&Howtos

How to know Kafka

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to understand Kafka, the content is concise and easy to understand, can definitely make your eyes bright, through the detailed introduction of this article, I hope you can gain something.

Today we are going to talk about Kafka, mainly to introduce you to Kafka and to talk about some important concepts and issues in Kafka.

When we often mention Kafka now, we already default that it is a very good message queue, and we often compare it to RocketMQ and RabbitMQ. I think the main advantages of Kafka over other message queues are as follows:

Extreme performance: based on Scala and Java language development, the design uses a large number of batch processing and asynchronous ideas, up to tens of millions of messages per second.

Ecosystem compatibility is unmatched: Kafka has one of the best compatibility with surrounding ecosystems, especially in the field of big data and flow computing.

In fact, in the early days, Kafka was not a qualified message queue, and the early Kafka was like a ragged child in the message queue field, with incomplete functions and some minor problems such as missing messages, not ensuring the reliability of messages, and so on. Of course, this also has a lot to do with the fact that LinkedIn first developed Kafka to deal with massive logs. Haha, people didn't want to be used as message queues in the first place, but who knew they had a place in the field of message queues by mistake.

With the follow-up development, these shortcomings have been gradually repaired and improved by Kafka. So * * the idea that Kafka is unreliable as a message queue is out of date! * *

First acquaintance of Kafka

First, let's take a look at the introduction to the official website, which should be the most authoritative and real-time. It doesn't matter if it's English. I've extracted all the important information for you.

We can get the following information from the official introduction:

Kafka is a distributed streaming platform. What on earth does this mean?

The streaming platform has three key functions:

Message queuing: publish and subscribe to message flows, which is similar to message queues, which is why Kafka is also classified as message queues.

Fault-tolerant persistence stores recorded message flows: Kafka persists messages to disk, effectively avoiding the risk of message loss.

Streaming platform: processes when messages are published, and Kafka provides a complete library of streaming classes.

There are two main application scenarios for Kafka:

Message queuing: establish a real-time streaming data pipeline to reliably obtain data between systems or applications.

Data processing: build real-time stream data handlers to convert or process data streams.

There are several very important concepts about Kafka:

Kafka stores the record stream (stream data) in topic.

Each record consists of a key, a value, and a timestamp.

Kafka message model

"beside the point: early JMS and AMQP are related standards made by authoritative organizations in the field of message services, which I described in JavaGuide's article" message queuing is actually very simple. "however, the evolution of these standards can not keep up with the evolution of message queues, and these standards are actually obsolete. So it is possible that different message queues have their own message models.

"queue model: the early message model

The queue (Queue) is used as the message communication carrier to meet the producer and consumer model, in which a message can only be used by one consumer, and the unconsumed message is retained in the queue until it is consumed or timed out. For example, if we producers send 100 messages, two consumers will consume. In general, two consumers will spend half of each according to the order in which the messages are sent (that is, the consumption of you and me.)

Problems in the queue model

Suppose we have a situation where we need to distribute the message generated by the producer to multiple consumers, and each consumer can receive the completed message content.

In this case, the queue model is not easy to solve. Many people who are more leveraged say: we can create a separate queue for each consumer and let producers send multiple copies. This is a very stupid way to waste resources and run counter to the purpose of using message queues.

Publish-subscribe model: Kafka message model

The publish-subscribe model is mainly to solve the problems existing in the queue model.

The publish and subscribe model (Pub-Sub) uses a Topic as a message communication carrier, similar to the broadcast mode; the publisher publishes a message that is delivered to all subscribers through the topic, and users who subscribe after a message broadcast will not receive the message.

In the publish-subscribe model, if there is only one subscriber, it is basically the same as the queue model. Therefore, the publish-subscribe model is compatible with the queue model at the functional level.

Kafka uses the publish-subscribe model. As shown in the following figure:

"the message model of RocketMQ is basically the same as that of Kafka. The only difference is that there is no concept of queues in RocketMQ, which corresponds to Partion.

"interpretation of important concepts of Kafka

Kafka sends messages published by producers to Topic (topics), and consumers who need these messages can subscribe to these Topic (topics), as shown in the following figure:

Kafka Topic Partition

The above picture also leads to some of the more important concepts of Kafka:

Producer (producer): the party that generates the message.

Consumer (consumer): the party that consumes the message.

Broker (proxy): can be thought of as a stand-alone Kafka instance. Multiple Kafka Broker make up a Kafka Cluster.

At the same time, you must have noticed that each Broker contains two important concepts: Topic and Partion:

Topic (topic): Producer sends messages to specific topics, and Consumer consumes messages by subscribing to specific Topic (topics).

Partion (partition): Partion is part of Topic. A Topic can have multiple Partion, and the Partion under the same Topic can be distributed on different Broker, which means that a Topic can span multiple Broker. This is just like the picture I drew above.

"highlight: Partion in Kafka can actually correspond to queues in message queues. Isn't that a little easier to understand?"

In addition, I think it is more important that Kafka introduces a multiple copy (Replica) mechanism for Partion. Between multiple replicas in a Partion, there will be a guy named leader, and the other copies will be called follower. The message we send is sent to the leader copy, and then the follower copy can pull the message from the leader copy for synchronization.

"producers and consumers only interact with leader replicas. You can understand that other replicas are just copies of leader replicas, which exist only to ensure the security of the message store. When the leader replica fails, a leader will be elected from the follower, but if there is a follower that is not synchronized with leader enough to participate in the leader campaign.

"what are the benefits of Kafka's multiple partitions (Partition) and multiple copies (Replica) mechanisms?

Kafka can provide better concurrency (load balancing) by assigning multiple Partition to a particular Topic, while each Partition can be distributed on different Broker.

Partition can specify the corresponding number of Replica, which greatly improves the security of message storage and disaster recovery, but also increases the required storage space accordingly.

The role of Zookeeper in Kafka

"if you want to understand the role of zookeeper in Kafka, you must build your own Kafka environment and then go into zookeeper to see which folders are related to Kafka and what information is stored on each node. Don't just look at it without practice, so that what you learn will eventually be forgotten!"

The following article will describe how to build a Kafka environment. Don't worry, you can build a Kafka environment in 3 minutes after reading the following article.

"this part refers to and draws lessons from this article: https://www.jianshu.com/p/a036405f989c."

The following figure is my local Zookeeper, which is successfully associated with my local Kafka (the following folder structure is implemented with the idea plug-in Zookeeper tool).

ZooKeeper mainly provides Kafka with the function of managing metadata.

As we can see from the figure, Zookeeper mainly does the following things for Kafka:

Broker registration: there will be a node on Zookeeper dedicated to recording Broker server lists. When each Broker starts, it registers with Zookeeper, that is, it creates its own node under / brokers/ids. Each Broker will record its own IP address, port and other information to the node.

Topic registration: in Kafka, messages from the same Topic are divided into multiple partitions and distributed over multiple Broker, and the partition information and the correspondence with Broker are also maintained by Zookeeper. For example, I created a theme named my-topic and it has two partitions, corresponding to the zookeeper will create these folders: / brokers/topics/my-topic/partions/0, / brokers/topics/my-topic/partions/1

Load balancer: as mentioned above, Kafka can provide better concurrency capability by assigning multiple Partition to a specific Topic, and each Partition can be distributed on different Broker. Different Partition,Kafka of the same Topic will try to distribute these Partition to different Broker servers. When the producer generates a message, it will also try to deliver it to the Partition of different Broker. When consuming Consumer, Zookeeper can achieve dynamic load balancing based on the current number of Partition and the number of Consumer.

How does Kafka guarantee the consumption order of messages?

In the process of using message queues, there are often business scenarios that need to strictly ensure the consumption order of messages. For example, we send two messages at the same time. The corresponding database operations for these two messages are: change the user membership level and calculate the order price according to the membership level. If the consumption order of the two messages is different, the final result will be completely different.

We know that the Partition (partition) in Kafka is the real place to store messages, and all the messages we send are placed here. Our Partition exists in the concept of Topic, and we can assign multiple Partition to a particular Topic.

Kafka Topic Partions Layout

The tail addition method is used every time a message is added to the Partition (partition), as shown in the figure above. Kafka can only guarantee the order of messages in Partition (partitions), not the order of Partition (partitions) in Topic (topics).

"messages are assigned a specific offset when they are appended to the Partition (partition). Kafka ensures the sequence of messages within the partition by offset."

Therefore, we have a very simple way to ensure the order of message consumption: one Topic corresponds to only one Partion. This will certainly solve the problem, but it undermines the original intention of Kafka.

When sending a message in Kafka, you can specify four parameters: topic, partition, and key,data (data). If you specify partion when you send a message, all messages will be sent to the specified partion. Moreover, messages of the same key can only be sent to the same partition, which we can use the id of the table / object as the key.

To sum up, there are two ways to ensure the order of message consumption in Kafka:

1 Topic corresponds to only one Partion.

(recommended) specify key/partion when sending messages.

Of course, there are not only the above two methods, but also the above two methods that I find easier to understand.

The above content is how to understand Kafka. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.