In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
As a streaming data platform, Kafka provides three kinds of clients for developers: producer / consumer, connector and stream processing. This paper focuses on the analysis of the threading models of these three clients. There is usually a surprise when you see the last one.
Consumer's threading model
Prior to version 0.8, consumer clients created a ZK-based consumer connector, a consumer client is a Java process, consumers can subscribe to multiple topics, and each topic can have multiple threads. In order to enable messages to be consumed distributed on multiple nodes and improve the throughput of message processing, Kafka allows multiple consumers to subscribe to the same topic, which need to meet the restriction that "a partition can only be processed by one thread in one consumer". Typically, we deploy the same application with the same business processing logic on different machines and specify a consumer group number. When consumer processes are started on different machines, all of these consumer processes form a logical consumer group.
The number of consumers in the consumer group is dynamic. When new consumers join the consumer group, or the old consumer leaves the consumer group, it will trigger the "rebalance" operation of the consumer group based on ZK. When the rebalance operation occurs, each consumer executes the partition allocation algorithm on the client and then obtains its own partition from the global allocation result. Its disadvantage is that consumers will interact with ZK frequently, resulting in excessive pressure on ZK clusters, and it is easy to produce problems such as herding effect and brain fissure.
After version 0. 8, Kafka redesigned the client and introduced the Coordinator and Consumer Group Management Protocol. New consumers separate the Consumer Group Management Agreement from the Partition allocation Strategy. The coordinator is responsible for the management of the consumer group, while the partition assignment is done in one of the primary consumers of the consumer group. In this way, each consumer needs to send the following two requests to the coordinator.
Join group request: the coordinator collects all consumers in the consumption group and elects a primary consumer to perform the partition assignment.
Synchronous group request: the primary consumer completes the partition allocation, and the coordinator propagates the partition allocation result to each consumer.
The new version of the consumer client introduces an abstract class of client coordinator, and its implementation includes not only the consumer coordinator, but also a connector implementation.
Thread Model of Connector
The advent of Kafka connectors standardizes data synchronization between Kafka and various external storage systems. It is very easy for users to develop and use connectors. By defining connectors in configuration files, you can import data from external systems into Kafka or export Kafka data to external systems. As shown in figure 1, the middle part is the internal components of the Kafka connector, including the source connector (Source Connector) and the destination connector (Sink Connector).
Figure 1 Source and destination connectors for Kafka connectors
The stand-alone mode of the Kafka connector launches a Worker and all connectors and tasks within a process. Each process in distributed mode has a Worker, while connectors and tasks run on each node. Figure 2 lists four ways in which connectors and tasks are distributed on different Worker:
One Worker, one source task, one target task
One Worker, two source tasks, two target tasks
Two Worker, two source tasks, two target tasks
Three Worker, two source tasks, two target tasks
Figure 2 Kafka connector cluster in distributed mode
In distributed mode, the coordination between different Worker processes is similar to that of consumers. Consumers get assigned partitions through the coordinator, and Worker also gets assigned connectors and tasks through the coordinator. As shown in figure 3, in order to join the group management, the consumer client and the Worker client communicate with the consumer group coordination (GroupCoordinator) of the server through the client's coordinator object, respectively.
Figure 3 the work of the consumer and the Worker is assigned through the coordinator
Threading model of stream processing
From a simple point of view, the workflow of Kafka flow processing is divided into three steps: the consumer reads the data of the input partition, processes each piece of data in a stream, and the producer writes the processing result to the output partition, in which step 1 also makes full use of the Consumer Group Management Protocol. The input data source processed by the Kafka stream is based on the Kafka topic with a distributed partitioning model, and its threading model mainly consists of the following three classes:
Stream instance (KafkaStreams): usually a node (a machine) runs only one stream instance.
Stream thread (StreamThread): a stream instance can be configured with multiple stream threads.
Flow task (StreamTask): a stream thread can run multiple stream tasks, and the number of tasks is determined according to the number of partitions of the input topic.
As shown in figure 4, the input topic has six partitions, and Kafka flow processing produces a total of six flow tasks. The flow instance can be dynamically extended, and the number of stream threads can be dynamically configured. If there are three stream threads in the figure, each stream thread will have two flow tasks, each of which corresponds to a partition of the input topic.
Figure 4 Thread model for Kafka stream processing
Kafka's streaming framework uses a parallel threading model to deal with datasets of input topics, which is very similar to Kafka's consumer threading model. Consumers are assigned to different partitions of the subscription topic, and the flow tasks of the streaming framework are also assigned to different partitions of the input topic. As shown in figure 5, partition P1 of input topic 1 and partition P1 of input topic 2 are assigned to the flow task of stream thread 1, and partition P2 of input topic 1 and partition P2 of input topic 2 are assigned to the flow task of stream thread 2. The stream processing also writes the calculation results of the topology to the output topic compared to the consumer.
Figure 5 Consumer model and threading model for stream processing
The fault tolerance mechanisms for consumer and stream processing are also similar. As shown in figure 6, assuming that the Consumer 2 process dies, the partitions it holds will be assigned to Consumer 1 in the same consumer group, so that Consumer 1 will be assigned to all partitions of the subscription topic. For stream processing, if stream thread 2 dies, the flow task in stream thread 2 is assigned to stream thread 1. That is, stream thread 1 runs two flow tasks, and the partition assigned to each flow task remains the same.
Fig. 6 Fault tolerance mechanism for consumer and flow processing
Small knot
The "group management protocol" abstracted by the Kafka client is fully used in three usage scenarios: consumer, connector and stream processing. The consumer in the client, the worker in the connector, and the flow process in the flow processing can all be regarded as a member of the group. When adding or decreasing group members, under the constraints of this protocol, each group member can get the latest tasks, thus achieving seamless task migration. Once you understand the Group Management Protocol, it is helpful to understand the architectural design of Kafka.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.