Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How Kafka works in a distributed environment

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how Kafka works in a distributed environment". In daily operation, I believe many people have doubts about how Kafka works in a distributed environment. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how Kafka works in a distributed environment". Next, please follow the editor to study!

What is a messaging system?

Before you know Kafka, if you don't know what Message Queue is, you need to add it. If you already know, you can skip to the next paragraph.

> Morden Distributed System

As shown in the figure above, Message Queue is a middleware that transmits and stores messages between two systems. Its appearance has the following advantages:

Decoupling: as long as you ensure that both parties follow the same interface constraints, you can extend or modify the processing of both sides independently.

Redundancy: message queuing retains data until processing is complete, avoiding the risk of data loss. In the insert-get-delete paradigm used by many message queues, before deleting a message from the queue, your processing system needs to clearly indicate that the message has been processed to ensure that your data is saved safely. Finish using it.

Scalability: because message queuing decouples your processing, it is easy to increase the frequency of message queuing and processing as long as additional processing is added.

Flexibility and peak capacity: applications still need to continue to play a role in the face of a sharp increase in traffic, but this burst of traffic is not standard. There is no doubt that it is a huge waste to invest resources based on the ability to handle peak visits. The use of message queues can expose critical components to sudden access pressure without completely crashing due to unexpected overload requests.

Recoverability: when some parts of the system fail, it does not affect the entire system. Message queuing reduces coupling between processes, so messages added to the queue can be processed after the system is restored even if the process processing the message is hung up.

Order assurance: in most use cases, the order of data processing is critical. Most message queues are initially sorted to ensure that the data will be processed in a specific order. (Kafka guarantees the order of messages in the partition)

Buffering: helps to control and optimize the speed of data flow through the system, and to resolve inconsistent processing speed between production messages and consumption messages.

Asynchronous communication: in many cases, users do not want and do not need to process messages immediately. Message queuing provides an asynchronous processing mechanism that allows users to put messages on the queue, but not quickly. Put the required number of messages into the queue and process them as needed.

At the same time, I think the biggest disadvantage is complexity, and its advantages are negligible.

How does Kafka work?

For Kafka, from an independent point of view, it includes producers, consumers and brokers.

The producer is responsible for sending messages to the broker fixed topic

The agent maintains a set of topics and manages the partitions in that topic

The consumer is responsible for extracting messages from the corresponding subject of the broker

> Kafka components

As shown in the figure, different producers can send messages to multiple partitions of multiple topics, and consumers can consume from various topics.

Producers and consumers are completely isolated.

In this design, it fully embodies decoupling, flexibility and peak processing capacity, order guarantee and asynchronous communication.

How does Kafka work in a distributed environment?

1. Cluster

Multiple proxies and copies.

Copy, partition copy to ensure the high availability of the partition

Roles among leaders, replicas, producers and users interact only with leaders

A role in a follower who replicates the data in the leader.

How does Kafka ensure redundancy, recoverability and high availability?

Replication can provide high availability even if some nodes fail:

Producers can continue to publish messages

The consumer can continue to receive messages. There are two scenarios that ensure strong and consistent data replication: primary backup replication and quorum-based replication. Both options require the election of a leader and others as followers. All writes are sent to the leader, who then sends the message to the follower.

Arbitration-based replication can use algorithms such as rafts and Paxos, such as Zookeeper,Google Spanner, etc. In the case of 2n + 1 nodes, a maximum of n node failures can be tolerated.

Replication based on the primary database and other writes to the primary database and backups are successful only after the message is successfully received. For n nodes, a maximum of 1 node failure can be tolerated, such as PacifiaA of Microsoft.

These two methods have their own advantages and disadvantages.

The delay based on quorum may be better than the primary backup, because the quorum-based method requires only a few nodes to write successfully to return.

Replication based on primary backup can withstand more node failures under the same number of nodes, and as long as one node is active, it can work properly.

In the case of two nodes, the primary backup can provide fault tolerance, and the arbitration-based method requires at least three nodes.

Kafka uses the second approach, the master-slave mode, which is mainly based on fault tolerance and provides high availability in the case of two nodes.

What if the node is slow?

First of all, this rarely happens. If this happens, you can set the timeout parameter to handle the situation.

Replication of Kafka applies to partitions.

For example, in the above figure, there are four agents, one theme, and two partitions. The replication factor is three. When the producer sends a message, it will select a partition, such as topic1-part1 partition, and send the message to the leader of that partition, broker2,broker3 will pull out the message, and when the message is pulled out, the slave will send ack to the host, this time the host will only submit this log.

In the process, producers have two options:

One is to wait for all copies to be successfully extracted, and then the producer disk receives a successful response.

The other is to wait for the leader to write successfully and get a successful response.

In the first, you can ensure that messages are not lost in exceptional circumstances, but the latency is reduced. The wait time for the latter has been greatly improved, but in the event of an exception, the secondary server will not be able to extract the latest messages before the leader suspends. In this case, the message may be lost.

two。 Customer base

Consumers mark themselves with consumer group names, and each record published to a topic is passed to a consumer instance in each subscribed consumer group. The consumer instance can be in a separate process or on a separate machine.

If all consumer instances have the same consumer group, records are effectively balanced on those consumer instances.

If all consumer instances have different user groups, each record will be broadcast to all user processes to form an official document

In short, the consumer community is the real consumer in the Kafka ecosystem.

3. Controller

The picture above is the design of Kafka Controller in 2015. Controller and ZK work together to build the high-level architecture of Kafka, which accomplishes the following tasks:

Manage the dynamic joining and leaving of brokers and consumers.

Trigger load balancing. When the broker or user joins or leaves, the load balancing algorithm will be triggered to carry out subscription load balancing for multiple users in a user group.

Maintain consumption relationships and consumption information for each partition.

Why is Kafka so fast?

There is a process in Kafka in which a large amount of network data is persisted to disk (producer to agent) and disk files are sent over the network (broker to consumer).

The performance of this process directly affects the overall throughput of Kafka.

1. Zero replication

On the left side of the image above are the traditional four copies and four context switches.

First, the file data is read into the kernel state buffer through the system call (DMA replication)

The application then reads the memory state buffer data into the user state buffer (CPU copy)

Next, the user program reads the user status buffer data when sending data through the socket. Copy to kernel state buffer (CPU copy)

Finally, the data is copied to the NIC buffer through DMA replication. At the same time, it is accompanied by four context switches.

On the right side of the figure above, Kafka uses the Linux 2.4 + kernel sendfile system call to achieve zero replication.

Data is copied to the kernel state buffer via DMA

It is copied directly to the NIC buffer through DMA without the need for CPU replication

Because the sendfile call completes the transfer of the entire file reading network, there are only two context switches in the whole process, so the performance is greatly improved.

To be exact, the data transmission of Kafka is done through TransportLayer, and its subclass PlaintextTransportLayer realizes zero replication through the transferTo and transferFrom methods of Java NIO's FileChannel.

two。 Sequential access

> Compare

The figure above shows that even if the disk is read sequentially, the huge advantage of sequential access is better than memory-based random access.

Each message in the Kafka is appended and the message is not written or deleted from the middle to ensure sequential access to the disk.

Even sequential reads and writes, too many small IO operations can cause disk bottlenecks, and this time random reads and writes.

Kafka's strategy is to summarize messages and send them in batches to minimize access to the disk. Therefore, the number of topics and partitions for Kafka should not be excessive.

Typically, after 64 topics / partitions, the performance of Kafka degrades sharply.

3. Segment log

Kafka uses this topic to manage messages. Each topic contains multiple parts, each part corresponds to a logical log, and consists of multiple parts.

Multiple messages are stored in each segment. Its logical location determines the message ID, that is, the message ID can be located directly to the storage location of the message, thus avoiding the additional mapping of ID to location.

Each part corresponds to an index in memory and records the offset of the first message in each segment.

Messages sent by the publisher to a particular topic are evenly distributed to multiple parts (randomly or according to a user-specified callback function), and the agent receives the published message and adds the message to the last paragraph of the corresponding section. When the number of messages on the segment reaches the configured value or the message publishing time exceeds the threshold, the message on the segment will be flushed to disk, and only message subscribers who refresh to disk can subscribe to the message. When the segment reaches a certain size, no more data is written to the segment, and the agent creates a new segment.

This partition and index design not only improves the efficiency of data reading, but also improves the parallelism of data operations.

4. High performance Broker

Kafka's design in Broker is also one of the reasons why it is so fast.

First, all requests sent by the client are sent to the receiver. There will be three threads by default in the agent. These three threads are called processors.

The recipient will not do any processing to the customer's request, but will encapsulate it directly. SocketChannel is sent to these processors to form a queue.

The method of sending is polling, that is, it is sent to the first processor, then to the second, third processor, and then back to the first processor. When the consumer thread uses these socketChannel, it will get the request requests, and the data will accompany these requests.

By default, there are eight threads in the thread pool. These threads are used to process and resolve requests. If the request is a written request, it is written to disk. If read, the result is returned. The processor reads the response data from the response and returns it to the client.

This is the three-tier network architecture of Kafka.

Therefore, if we need to enhance and adjust the Kafka, increase the processor, and increase the number of processing threads in the thread pool, we can achieve the effect. Considering that the processor generates requests too quickly and there are not enough threads to process requests in a timely manner, requests and responses are actually a caching effect.

At this point, the study on "how Kafka works in a distributed environment" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report