How to understand the basic concept of Kafka 07/04 Update SLTechnology News&Howtos

How to understand the basic concept of Kafka

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to understand the basic concept of Kafka, I believe that many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Introduction

Kafka is a distributed, partitioned, replicable messaging system. It provides the functions of an ordinary messaging system, but has its own unique design.

What does this unique design look like?

First, let's look at a few basic messaging system terms:

Kafka generalizes messages in topic units.

The program that publishes messages to Kafka topic is called producers.

The program that subscribes to topics and consumes messages is called consumer.

Kafka runs in clusters and can consist of one or more services, each called a broker.

Producers sends messages to the Kafka cluster over the network, and the cluster provides messages to consumers, as shown in the following figure:

The client and the server communicate through TCP protocol. Kafka provides a Java client and supports multiple languages.

Topics and Logs

Let's first take a look at an abstract concept provided by Kafka: topic.

A topic is a summary of a set of messages. Its log is partitioned for each topic,Kafka, as shown in the following figure:

Each partition consists of a series of ordered, immutable messages that are successively appended to the partition. Each message in the partition has a consecutive sequence number called offset, which is used to uniquely identify the message in the partition.

For a configurable period of time, the Kafka cluster retains all published messages, regardless of whether they are consumed or not. For example, if the save policy of a message is set to 2 days, it can be consumed within two days after a message is published. It will then be discarded to free up space. The performance of Kafka is of a constant order of magnitude independent of the amount of data, so keeping too much data is not a problem.

In fact, the only data that each consumer needs to maintain is the location of the message in the log, that is, offset. This offset is maintained by consumer: in general, the value of the offset increases as the consumer reads messages, but consumer can actually read messages in any order, for example, it can set offset to an old value to reread previous messages.

The combination of the above features makes Kafka consumers very lightweight: they can read messages without affecting the cluster and other consumer. You can use the command line to "tail" messages without affecting other consumer that are consuming messages.

Partitioning logs can achieve the following purposes: first, this makes the number of each log not too large and can be saved on a single service. In addition, each partition can be published and consumed separately, providing a possibility for concurrent operation of topic.

Distributed system

Each partition has replicas in several services in the Kafka cluster so that these services that hold replicas can work together to process data and requests, and the number of replicas can be configured. Replicas make Kafka fault tolerant.

Each partition has a server as the "leader", zero or several servers as the "followers", leader handles the reading and writing of messages, and followers copies the leader. If you leader down, one of the followers will automatically become a leader. Each service in the cluster plays two roles at the same time: the leader as part of the partition it holds, and as the followers of other partitions, so that the cluster has a better load balance.

Producers

Producer publishes messages to the topic it specifies and is responsible for deciding which partition to publish to. Usually, the partition is randomly selected by the load balancing mechanism, but it can also be selected by a specific partition function. The second is more often used.

Consumers

There are usually two modes of publishing messages: queue mode (queuing) and publish-subscribe mode (publish-subscribe). In queue mode, consumers can read messages from the server at the same time, and each message is read by only one of the consumer; in publish-subscribe mode, messages are broadcast to all consumer. Consumers can join a consumer group, competing for messages in a topic,topic to be distributed to a member of the group. Consumer in the same group can be in different programs or on different machines. If all the consumer are in one group, this becomes the traditional queue mode, and load balancing is implemented in each consumer. If all consumer are not in different groups, this becomes the publish-subscribe model, and all messages are distributed to all consumer. More commonly, each topic has a number of consumer groups, each of which is a logical "subscriber", and each consists of several consumer for fault tolerance and better stability. This is actually a publish-subscribe model, except that subscribers are a group rather than a single consumer.

A cluster of two machines has four partitions (P0-P3) and two consumer groups. There are two consumerB in group An and four in group A.

Compared with the traditional message system, Kafka can guarantee the orderliness.

Traditional queues store ordered messages on the server, and if multiple consumers consume messages from this server at the same time, the server distributes messages to the consumer in the order in which the messages are stored. Although the server publishes messages sequentially, messages are distributed asynchronously to each consumer, so the original order may have been lost when the messages arrive, which means that concurrent consumption will lead to disorder. To avoid failures, such messaging systems usually use the concept of "dedicated consumer", which allows only one consumer to consume messages, which, of course, means loss of concurrency.

Kafka does better in this respect, and through the concept of partitioning, Kafka can provide better orderliness and load balancing in the case of multiple consumer groups concurrently. Each partition is distributed to only one consumer group, so that a partition is consumed by only one consumer of that group, and messages for that partition can be consumed sequentially. Because there are multiple partitions, it is still possible to load balance among multiple consumer groups. Note that the number of consumer groups cannot be greater than the number of partitions, that is, concurrent consumption is allowed as many partitions as there are.

Kafka can only guarantee the ordering of messages within a partition, but not between different partitions, which can meet the needs of most applications. If you need the ordering of all the messages in the topic, you can only make the topic have only one partition, and of course only one consumer group consumes it.

After reading the above, have you mastered how to understand the basic concepts of Kafka? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.