Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The architecture principle of kakfa is explained in detail.

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "detailed explanation of the architecture principle of kakfa". In daily operation, I believe many people have doubts about the detailed explanation of the architecture principle of kakfa. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "detailed explanation of the architecture principle of kakfa". Next, please follow the editor to study!

The following is the main content of this article:

From this article you will learn:

Philosophy and principles of Kafka Architecture Design

The role of zookeeper in Kafka

Principle of Kafka Controller implementation

Kafka Network principle

Start by sending a message

Make some products as much as possible. It is very important to have a work. This is a window for others to know about you. If possible, open an official account or a blog to record what you see, hear and think about every day. At first, the record will be messy and illogical, but it will be of great value to stick to it.

Architecture

To understand the Kafka architecture is to understand the concepts of the various components of Kafka and the relationships between these components. First take a brief look at each component and its simple description.

Don't try to remember them.

Producer: the producer, the party that sends the message. The producer is responsible for creating the message and then sending it to the Kafka.

Consumer: the consumer, the party who receives the message. The consumer connects to the Kafka and receives the message, which is processed by the corresponding business logic.

Consumer Group: a consumer group can contain one or more consumers. The use of multi-partition + multi-consumer mode can greatly improve the processing speed downstream of the data, consumers in the same consumption group will not repeat consumption messages, similarly, consumer messages in different consumption groups do not affect each other. Kafka is to realize the message P2P mode and broadcast mode through the consumer group.

Broker: service proxy node. Broker is the service node of Kafka, that is, the server of Kafka.

Topic: messages in Kafka are divided in units of Topic. Producers send messages to a specific Topic, while consumers subscribe to Topic messages and consume them.

Partition: Topic is a logical concept that can be subdivided into multiple partitions, each of which belongs to a single topic. Different partitions under the same topic contain different messages. Partitions can be regarded as an appendable Log file at the storage level, and messages are assigned a specific offset when they are appended to the partition log file.

Offset: offset is the unique identity of the message in the partition, and Kafka uses it to ensure the order of the message within the partition, but offset does not cross the partition, that is, Kafka ensures partition order rather than topic order.

Replication: replica is a way for Kafka to ensure high availability of data. Kafka data of the same Partition can have multiple replicas on multiple Broker. Usually, only the master copy provides external reading and writing services. When the broker where the master copy resides crashes or a network exception occurs, Kafka will re-select a new Leader replica under the management of Controller to provide external reading and writing services.

Record: a message record that is actually written to the Kafka and can be read. Each record contains key, value, and timestamp.

When we understand it, we will naturally remember it.

We should memorize them through understanding.

Producer-consumer

Producer-consumer is a design pattern, which is decoupled by adding an intermediate component between producer and consumer. The producer generates data to the intermediate component, and the consumer consumption data.

Just like Brother 65 wrote a love letter to Xiaofang when he was a student. Here Brother 65 is the producer, the love letter is the news, and Xiaofang is the consumer. But sometimes Xiaofang is away, or busy, 65 elder brother is also more shy, dare not directly put the love letter in Xiaofang's hand, so he stuffed the love letter in Xiaofang's drawer. So the drawer is the intermediate component.

In the program, we usually use Queue as this intermediate component. You can use multiple threads to write data to the queue, and other consumer threads read the data in the queue in turn for consumption. The model is shown in the following figure:

By adding an intermediate layer, the producer-consumer model can not only decouple producers and consumers and make it easy to expand, but also asynchronously call, buffer messages, and so on.

Distributed queue

Later, 65 elder brother and Xiaofang went to different places, 65 elder brother struggled in the volume, and Xiaofang went shopping in Mudu. So we can only send ambiguous letters through the post office. In this way, Brother 65, the post office and Xiaofang become distributed. Brother 65 sent the letter to the post office. Xiaofang got the letter from Brother 65 from the post office and went back to read it slowly.

The message producer of Kafka is Producer. The upstream consumer process adds Kafka Client to create Kafka Producer and sends messages to Broker. Broker is a Kafka Server process deployed in a cluster on a remote server, and downstream consumer processes introduce messages into the Kafka Consumer API continuous consumption queue.

Because Kafka Consumer uses the mode of Poll, Consumer is required to actively pull messages. All Xiaofang can only go to the post office to pick up letters regularly (er, sure enough, Xiaofang has the initiative).

Theme

The post office can't just serve Brother 65, although Brother 65 writes several letters a day. But it can't recover the loss of the post office. So the post office can send letters to anyone. As long as the sender writes down the address (subject), the post office has a passageway between the two places to send and receive letters.

Kafka's Topic is the equivalent of a queue, and Broker is the machine on which all queues are deployed. Different Topic,Producer can be created by business to send messages to the Topic of the business to which they belong, and the corresponding Consumer can consume and process messages.

Zoning

Because 65 elder brother wrote too many letters, a post office can no longer meet the needs of 65 elder brother, the postal company can only build a few more post offices, 65 elder brother will classify letters according to private density (zoning strategy) and send them from different post offices.

Multiple partitions can be created with the same Topic. In theory, the more partitions, the higher the concurrency. According to the partition policy, Kafka will distribute the partitions on different Broker nodes as evenly as possible to avoid message skew. Different Broker loads vary too much. The more districts, the better. After all, too many postal companies are unable to manage them. For specific reasons, please refer to the previous article "Kafka performance: why Kafka is so fast"?

Copy

In case there is a problem in the post office, such as the traffic is cut off, the mail car runs out of gas. As a result, 65 elder brother's ambiguous letter could not be sent to Xiaofang, making 65 elder brother kneel on the keyboard remotely at night. The post office decided to copy several copies of Brother 65's letter and send it to several normal post offices, so that as long as one post office was still there, Xiaofang could receive the letter from Brother 65.

Kafka uses partition replicas to ensure the high availability of data. Each partition will establish a specified number of replicas. Kakfa ensures that the same partition replicas are distributed on different Broker nodes as far as possible to prevent all replicas from being unavailable due to Broker downtime. Kafka selects one of the multiple replicas of the partition as the master copy (Leader), which provides read and write services to synchronize Leader data in real time from the replica (Follower).

Multiple consumers

Hey, 65 elder brother's letters are flying everywhere. Xiaofang goes to the post office every day and has to open them one by one. The letters written by 65 elder brothers are smelly and long, making Xiaofang so busy that he is covered in sweat. So Xiaofang snapped, and soon, ah, multiple people went to different post offices to get letters, so Xiaofang could finally squeeze out extra time to go shopping.

Broadcast message

The post office recently offered a customized postcard service, where everyone can design postcards and can only receive one postcard for the same identity. Brother 65 designed a bunch, broadcast to all beautiful little sisters can come to get, beautiful women can also come to get, but multiple avatars of the same identity can only take one kind of postcard.

Kafka implements broadcast mode message subscription through Consumer Group, that is, consumer under different group can consume messages repeatedly without affecting each other, and the consumer under the same group forms a whole.

Finally, we have completed the overall architecture of Kafka, as follows:

Zookeeper

Zookeeper is a mature distributed coordination service, which can provide distributed configuration service, synchronization service and naming registration for distributed services. For any distributed system, there is a need for a way to coordinate tasks. Kafka is a distributed system built using ZooKeeper. But there are other technologies, such as Elasticsearch and MongoDB, that have their own built-in task coordination mechanisms.

Kafka stores metadata information for Broker, Topic, and Partition on Zookeeper. By establishing the corresponding data node on Zookeeper and listening for changes in the node, Kafka uses Zookeeper to perform the following functions:

Leader Election of Kafka Controller

Kafka cluster member management

Topic configuration management

Partition replica management

We can see these related functions at a glance by taking a look at the nodes created by Kafka under Zookeeper.

Controller

Controller is elected from the Broker and is responsible for managing the partition Leader and Follower. When a leader copy of a partition fails, Controller is responsible for electing a new leader copy for that partition. When a change is detected in the ISR (In-Sync Replica) set of a partition, the controller is responsible for notifying all broker to update its metadata information. When you use the kafka-topics.sh script to increase the number of partitions for a topic, the controller is also responsible for reallocating the partitions.

The election work of Contorller in Kafka depends on Zookeeper, and broker, which is successfully elected as the controller, creates the temporary (EPHEMERAL) node / controller in Zookeeper.

Election process

When Broker starts, it tries to read the value of brokerid of / controller node. If the value of brokerid is not equal to-1, it means that another Broker has successfully become a Controller node. The current Broker actively abandons the election. If there is no / controller node, or if the brokerid value is abnormal, the current Broker attempts to create the / controller node, and it is possible that other broker will also try to create this node. Only the successful broker will become the controller, and the failed broker means losing the election. Each broker stores the brokerid value of the current controller in memory, which can be identified as activeControllerId.

Realize

Controller reads the node data in the Zookeeper, initializes the context (Controller Context), and manages node changes, changing the context, and also needs to synchronize the change information to other ordinary broker nodes. Controller obtains zookeeper information through scheduled tasks or listener mode, and event listeners update context information. As shown in the figure, Controller also uses producer-consumer implementation mode. Controller sends zookeeper changes to the event queue through events, which is a LinkedBlockingQueue. Event consumer thread groups consume events and synchronize the corresponding events to each Broker node. This mode of queuing FIFO ensures the ordering of messages.

Duty

Controller was elected as the manager of the entire Broker cluster, managing all cluster information and metadata information. Its duties include the following parts:

To handle the online and offline Broker nodes, including the cluster changes caused by natural downtime, downtime and network unreachability, Controller needs to update the cluster metadata in time and notify all Broker cluster nodes of the cluster changes.

To create a Topic or Topic expanded partition, Controller is responsible for the allocation of partition copies and dominates the Leader election of Topic partition replicas.

Manage state machines for all replicas and partitions in the cluster, listen for state machine change events, and deal with them accordingly. The Kafka partition and replica data are managed by the state machine. The changes of the partition and replica in the state machine will cause the state of the state machine to change, thus triggering the corresponding change events.

65 Brother: state machine, it sounds so complicated.

Controller manages the state machines for all replicas and partitions in the cluster. Don't be fooled by the word state machine. It's easy to understand the state machine. First understand the model, that is, what is this about what model, and then what is the state of the model, how to transition between model states, and send the corresponding change events during the transition.

The partition and replica state machine for Kafka is simple. Let's first understand that this manages partitions and replicas of Kafka Topic, respectively. Their state is also very simple, namely CRUD, specifically as follows:

Partition state machine

PartitionStateChange, which manages the partition of Topic, has the following four states:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

NonExistentPartition: this state indicates that the partition has not been created or deleted after creation.

NewPartition: the partition is in this state just after it was created. The partition has been assigned a copy in this state, but no leader has been elected and there is no ISR list.

OnlinePartition: once the leader for this partition is elected, it will be in this state.

OfflinePartition: when the leader of the partition goes down, transition to this state.

Let's use a diagram to visually see how these states change and what Controller does when they change:

Copy state machine

ReplicaStateChange, replica status, managing partition replica information, it also has four states:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

NewReplica: create replicas after creating topic and partition assignment. At this time, replica can only get the request to change the status of follower.

OnlineReplica: when replica becomes the assingned replicas of parition, its state changes to OnlineReplica, that is, a valid OnlineReplica.

OfflineReplica: when a replica goes offline and enters this state, this usually occurs in the case of broker downtime

NonExistentReplica: after the Replica is deleted successfully, replica enters the NonExistentReplica state.

The changes between replica states are shown in the following figure. Controller will act accordingly when the status changes:

Network

Kafka's network communication model is based on NIO's Reactor multithreading model. It contains an acceptor thread to handle new connections, Acceptor has N Processor threads select and read socket requests, and N Handler threads process requests and correspondingly, that is, business logic. The following is a model diagram of KafkaServer:

At this point, the study of "detailed explanation of the architectural principles of kakfa" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report