How to analyze the Core principle of High performance message queuing CKafka 07/04 Update SLTechnology News&Howtos

How to analyze the Core principle of High performance message queuing CKafka

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to analyze the core principle of high-performance message queuing CKafka, I believe that many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1. Background

Ckafka is a high-performance and highly available messaging middleware developed by Infrastructure Department. It is mainly used for high-performance scenarios such as message transmission, website activity tracking, operation monitoring, log aggregation, streaming, event tracking, log submission and so on. It has been launched on Tencent Cloud. Ckafka is fully compatible with the existing Kafka protocol, so that existing Kafka users can move into Ckafka at no cost. Ckafka is developed and optimized based on the existing Kafka. In order to facilitate users to understand Ckafka, this paper will also introduce the implementation principle of Kafka in detail.

2.Kafka principle 2.1 Kafka birth background

Kafka is a high-throughput distributed messaging system with publish and subscribe mode. It was originally developed by LinkedIn in Scala language and used as the basis of LinkedIn's activity flow tracking and operation system data processing pipeline. Now an open source project for Apache, its main design goals are as follows:

The ability of message persistence is provided in the way of time complexity O (1), and the access performance of constant time complexity can be guaranteed even for data above TB level. Note: in fact, the constant time performance of O (1) is guaranteed for writing Kafka. But for reading, it is segment fragmentation level logarithmic O (logn) time complexity.

High throughput. Kafka strives to support the message transmission capability of 100Kqps on a stand-alone machine, even on very cheap commercial computers.

Support message partitioning between Kafka Server (partition), and distributed consumption, while ensuring the sequential transmission of messages within each partition. Note: in fact, the implementation logic of Kafka itself does not make this guarantee, the main algorithm is focused on the consumer side, guaranteed by the consumer allocation algorithm, the details will be described below.

Both offline data processing and real-time data processing are supported.

Support online horizontal expansion, the horizontal expansion of Kafka mainly comes from its design concept of partition.

2.2 mainstream message queues compared to RabbitMQRocketMQCMQKafka mode publish / subscribe traditional queue/ publish / subscribe synchronization algorithm GM synchronous double-write RaftISR (Replica) distributed expansion support stacking capacity disk (horizontal expansion) performance high high reliability generally very high general persistent memory / hard disk 2 .3 Architecture 2.3.1 overall architecture diagram

2.3.2 introduction to related concepts 2.3.2.1 zookeeper cluster

Components on which the Kafka system is strongly dependent. It stores Kafka core raw data (such as topic information configuration, broker information, consumer grouping, etc., equivalent to DB acting as the configuration management center of Kafka). Kafka's leader election (such as coordinator election, controller election, partition leader election, etc.) will also rely on zookeeper.

2.3.2.2 coordinator

The coordinator coordinator module is mainly used for managing the consumption grouping and the consumption offset, acting as an intermediary to manage the consumers and electing a consumer from the consumption grouping as the leader, and then sending all the consumer information in the consumption grouping to the leader. The leader is responsible for allocating the partition. This module is a new module added to Kafka version 0.9. there can be multiple coordinators in the Kafka cluster to manage different consumption groups respectively to improve the scalability of the whole system. It is mainly used to solve the previous elections that consumers (high level consumers api) need to conduct elections related to zookeeper connection, resulting in zookeeper stress, shock and brain fissure problems.

2.3.2.3 controller

The controller module is mainly responsible for partition leader election, monitoring, creating and deleting Topic events, and then sending them to the designated broker for processing. Only one controller,Kafka in the whole Kafka cluster can make use of the temporary node feature of zookeeper to conduct controller elections.

2.3.2.4 Broker

Message caching agents, Kafka clusters contain one or more servers, which are called Broker, are responsible for storing and forwarding messages, and act as agents to provide production and consumption services.

2.3.2.5 Topic

Message topic (category), a logical concept, especially refers to the different classification of message sources processed by Kafka. Users can store messages of different business categories to different Topic according to their own business form. Users only need to specify the topic they are concerned about when producing and consuming, without paying attention to the specific location of the data stored in the topic.

2.3.2.6 Partition

Physical grouping of Topic can specify the number of partitions when creating a Topic. Each partition is an ordered queue, each message is stored in production order, and each message is assigned a self-growing ordered offset of 64bit (equivalent to message id). Partition is the key factor that the entire Kafka can be scaled in parallel.

2.3.2.7 Replication

Copies, topic-level configuration, can be understood as the number of copies of topic messages. The main purpose of the concept added in version 0.8 of Kafka is to improve the availability of the system. Prevent broker from accidentally crashing and causing some partition to become unserviceable.

2.3.2.8 ISR

In-Sync Replicas, Kafka is used to maintain the broker list that keeps up with the leader data. When the leader crashes, the leader is elected first from this column.

2.3.2.9 Producer

Producer producer, uses Push mode to carry on the message release production. Producer can obtain broker information, topic information, and other metadata by connecting with zookeeper, and then interact with broker to publish messages. In this process, zookeeper is equivalent to a configuration management center (similar to the routing information provided by Name Server). There are two major drawbacks to exposing zookeeper information directly to Producer:

Zookeeper belongs to the core structure of the whole Kafka system, and its performance directly affects the size of the whole cluster, so when exposed to too many producers, it will lead to the decline of zookeeper performance and ultimately affect the size and stability of the whole Kafka cluster.

Zookeeper stores the core data of Kafka, and is vulnerable to attacks by malicious users if it is publicly exposed, resulting in the unserviceability of the Kafka cluster, so it is not recommended that Kafka service providers expose zookeeper information to consumers.

Because of the above problems, Kafka also provides Metadata RPC, through which the RPC producer can obtain the broker information, topic information and the leader information of the partition under the topic, and then the producer visits the specified broker for message production, thus hiding the zookeeper information from the producer to make the whole system more secure, stable and efficient.

2.3.2.10 Consumer

The consumer, using Pull, pulls the message from the Broker side and processes it. When subscribing to a Topic of interest (usually by using consumer high level api or new consumer to subscribe), Consumer must belong to a consumption group, and Kafka ensures that a message in the same Topic can only be consumed by one Consumer in the same consumption group, but multiple consumption groups can consume the message at the same time.

In fact, Kafka itself does not guarantee this (a message of the same topic can only be consumed by one consumer in the same consumption group), especially before version 0.9, Kafka Broker has neither the concept of consumption grouping nor the concept of consumption offset. Kafka only provides FetchMessage RPC for users to pull messages. As for who gets it and how many times it is taken, it does not care at all. The guarantee is completed by the algorithm within the consumer api.

Before version 0.9, consumer grouping is only a concept on the consumer side. All consumers in the same consumer grouping register through a connection with zookeeper, then choose a leader (a consumer grouping and a leader), and then allocate partition through this leader (the allocation algorithm defaults to range, or you can configure it to round robin or even implement an algorithm that is very flexible). All consumers access the partition assigned to them as agreed, and can choose to keep the consumption offset in the zookeeper or store it themselves. This approach can expose zookeeper and lead to the same problems as exposing zookeeper to Producer, and because any consumer exiting will trigger a zookeeper event and then re-rebalance, resulting in a very stressful zookeeper, but also a shocking and unsolvable brain fissure problem, after version 0.9 (included), Kafka Broker added the coordinator coordinator module.

However, the coordinator module does not carry out any processing related to the allocation algorithm, but just replaces some functions of zookeeper, acting as an intermediary for consumers to choose leader through zookeeper to become unified and coordinator communications, and then coordinator selects leader, and then sends all consumers in the same consumption group to leader (Consumer api), which is assigned by leader. Another aspect is that coordinator currently has the ability to manage offset. Consumers can choose to submit offset to coordinator and then save it by coordinator. By default, coordinator saves offset information in a special topic (default name _ consumer_offsets), thus reducing the pressure on zookeeper. For more information on the allocation of partition in the consumption group, please see the relevant description of the consumption group in the next summary.

2.3.2.11 Consumer Group

Consumer grouping, consumer labels, used to classify consumers. It can be simply understood as a queue. When a consumer packet subscribes to a topic, it is tantamount to creating a queue for the topic, and when multiple consumer packets subscribe to the same topic, it is equivalent to creating multiple queues, which achieves the purpose of broadcasting in disguise, and the broadcast only needs to store one data. In order to facilitate understanding, the concepts related to consumer grouping are explained through the pictures below.

A consumer grouping can subscribe to multiple topic. In the same way, a topic can be subscribed by multiple consumer groups.

The partition in topic will only be assigned to one consumer in the same consumption group. Based on this allocation strategy, if the message of the same user is assigned to the same partition by hash according to the message key when producing the message, the first-in-first-out message can be guaranteed. It is based on this allocation strategy that Kafka implements the first-in-first-out of messages.

In the same consumption group, different consumers may subscribe to different topic, but Kafka's partition allocation policy ensures that the topic in the same consumption group will only be assigned to consumers who subscribe to the topic, that is, the consumption group will be divided into another dimension according to topic. The figure above shows that C1 and C2 subscribe to Topic1 at the same time in Consumer group1, so the four partition of P0 ~ P3 under Topic1 are equally distributed to C1 and C2. In the same Consumer group1, only C1 subscribes to Topic0, so the two partition in Topic0 are only assigned to C1 and not to C2.

2.3.2.12 Message

Messages are the smallest unit of communication and storage. It contains a variable length head, a variable length key, and a variable length value. Key and value are specified by the user and are opaque to the user.

After reading the above, have you mastered how to analyze the core principles of high-performance message queuing CKafka? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.