A priority queue scheme based on Apache Pulsar 04/09 Update SLTechnology News&Howtos

A priority queue scheme based on Apache Pulsar

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author: Xiangzi, R & D engineer of push platform

First, business background

In a push scenario, message queuing occupies a very important position in the whole system.

When APP has a push demand, it will send a push command to a push. After receiving the push request, we will put the users who are requested by APP to push the message into the next sending queue to send the message. When multiple APP messages are sent at the same time, it is inevitable that there will be competition for resources, so there is a need for priority queues. When the distribution resources are fixed, high-priority users need to have more distribution resources.

Second, priority queue scheme based on Kafka

In view of the above scenarios, the first version of the priority queue scheme is designed based on Kafka. Kafka is a high-performance, distributed messaging system developed by LinkedIn; Kafka has a wide range of applications, such as log collection, online and offline message distribution and so on.

Architecture

In this scheme, the priority is set to three levels: high, medium and low. The specific operation plan is as follows:

According to the dimension of task (single push task), store different Topic for a certain priority. Only one Topic can be written to a task, and multiple task can be stored in a Topic.

According to the priority quota (such as 6:3:1), the consumption module gets the number of messages with different priorities, and polls for messages with the same priority, which ensures that high-priority users can send messages faster and avoids the non-delivery of low-priority users.

Problems encountered in the Kafka scenario

With the continuous development of push service, the number of connected APP is gradually increasing, and some problems are gradually exposed in the first version of the priority scheme:

When more and more APP with the same priority push tasks at the same time, the incoming task messages will be delayed due to the queuing condition of the previous task messages. As shown in the following figure, when the volume of task1 messages is too large, taskN will be waiting until the end of task1 consumption.

When the number of Topic in Kafka increases from 64 to 256, the throughput decreases severely, and each Topic and partition of Kafka corresponds to a physical file. When the number of Topic increases, the falling disk strategy of scattered messages will lead to fierce competition for disk IO, so we cannot alleviate the problem in the first point just by increasing the number of Topic.

Based on the above problems, a new round of technology selection has been carried out. We need to be able to create a large number of Topic, and the throughput performance can not be inferior to that of Kafka. After a period of research, Apache Pulsar has attracted our attention.

Third, why Pulsar

Apache Pulsar is an enterprise-class distributed messaging system, originally developed by Yahoo, open source in 2016, and graduated in September 2018 to become a top-level project of the Apache Foundation. Pulsar has been used in Yahoo's production environment for more than three years, mainly serving Mail, Finance, Sports, Flickr, the Gemini Ads platform, Sherpa (Yahoo's KV storage).

Architecture

Number of Topic

Pulsar can support the expansion of millions of Topic while maintaining good performance all the time. The scalability of Topic depends on its internal organization and storage. Pulsar data is saved on bookie (BookKeeper server), messages of different Topic in write state are sorted in memory, and finally aggregated and saved to large files. Fewer file handles are needed in Bookie. On the other hand, Bookie's IO is less dependent on the file system's Pagecache,Pulsar and therefore supports a large number of topics.

Consumption model

Pulsar supports three consumption models: Exclusive, Shared, and Failover.

Exclusive (exclusive): a Topic can only be consumed by one consumer. Pulsar uses this mode by default.

Shared (sharing): sharing mode in which multiple consumers can connect to the same Topic and messages are distributed to consumers in turn. When a consumer goes down or actively disconnects, ack messages distributed to that consumer are rescheduled and distributed to other consumers.

Failover (disaster recovery): there is only one consumer for a subscription, and there can be multiple backup consumers. Once the primary consumer fails, the backup consumer takes over. There are no two active consumers at the same time.

Exclusive and Failover subscriptions allow only one consumer to use and consume the Topic of each subscription. Both modes use messages in Topic partition order. They are best suited for flow (Stream) use cases that require strict message ordering.

Shared allows multiple consumers per topic partition. Each consumer in the same subscription receives only a portion of the messages from the Topic partition. Shared is best suited for usage patterns that do not require guaranteed message sequence queues (Queue), and can expand the number of consumers as needed.

Storage

Pulsar introduces Apache BookKeeper as the storage layer. BookKeeper is a distributed storage system optimized for real-time systems, which has the characteristics of scalability, high availability, low latency and so on. For more information, please refer to BookKeeper's official website.

Segment

BookKeeper uses Segment (known as ledger within BookKeeper) as the basic unit of storage. From Segment to message granularity, it is evenly distributed to the BookKeeper cluster. This mechanism ensures that data and services are evenly distributed in the BookKeeper cluster.

Both Pulsar and Kafka do Topic storage based on the logical concept of partition. The most fundamental difference is that the physical storage of Kafka is in units of partition, and each partition must be stored on a broker as a whole (a directory). On the other hand, the partition of Pulsar is based on segment as the unit of physical storage, and each partition will be scattered and evenly distributed among multiple bookie nodes.

The direct impact is that the size of the partition of Kafka is limited by the storage of a single broker, while the partition of Pulsar can utilize the storage capacity of the entire cluster.

Expand capacity

When the capacity of partition reaches the upper limit and needs to be expanded, if the existing single machine cannot meet it, Kafka may need to add new storage nodes and move the data of partition to the state of rebalance between nodes.

Pulsar only needs to add a new Bookie storage node. Because of the large remaining space, the newly added nodes will be given priority to receive more new data; the whole expansion process does not involve any copy and transfer of existing data.

Broker failure

Pulsar shows the same advantage when a single node fails. If a service node broker of the Pulsar fails, because the broker is stateless, other broker can quickly take over the Topic and will not involve the copy of Topic data; if the storage node Bookie fails, in the cluster backend, other Bookie will concurrently read data from multiple Bookie nodes and automatically recover the data of the failed node, which will not affect the front-end service.

Bookie failure

Replica fixes in Apache BookKeeper are many-to-many quick fixes at the Segment (or even Entry) level. This approach replicates only the necessary data, which is finer than re-copying the entire topic partition. As shown in the following figure, when an error occurs, Apache BookKeeper can read messages in Segment 4 from bookie 3 and bookie 4 and fix Segment 4 at bookie 1. All copy fixes are done in the background and are transparent to Broker and applications.

When a Bookie node goes wrong, BookKeeper automatically adds a new Bookie available to replace the failed Bookie, and the data in the erroneous Bookie is restored in the background, all Broker writes will not be interrupted, and the availability of the topic partition will not be sacrificed.

Fourth, priority queue scheme based on Pulsar

In terms of design ideas, there is not much difference between Pulsar scheme and Kafka scheme. However, in the new scheme, the push technical team solves the problems existing in the Kafka scheme with the help of the characteristics of Pulsar.

The Topic is generated dynamically according to the task, which ensures that the later task will not wait because of the accumulation of other task messages. Medium and high priority task all share one Topic, while low priority task shares n Topic. Within the same priority, each task polls for read messages, and flows to the next priority when the quota is full. Within the same priority, each task can dynamically adjust the quota, and more messages can be read in the same opportunity. With Shared mode, you can add and delete consumer dynamically without triggering a Rebalance situation. Using the BookKeeper feature, you can add storage resources more flexibly.

5. Pulsar other practices are relatively independent between different subscription. If you want to repeatedly consume messages from a certain Topic, you need to use different subscriptionName subscriptions; but the new subscriptionName,backlog that has been added will continue to accumulate. If Topic is unsubscribed, messages sent to it will be deleted by default. So if producer sends first and consumer receives later, make sure that subscription exists in Topic before producer is sent (even if close is dropped after subscribe), otherwise messages sent during this period of time will be unprocessed. If no one sends the message and no one subscribes to the message, the Topic will be deleted automatically after a period of time. Settings such as TTL of Pulsar take effect for the entire namespace and cannot be set for a single Topic. The keys of Pulsar are all based on the root directory of zookeeper. It is recommended to increase the total node name during initialization. The current java api design of Pulsar requires explicit confirmation of messages by default, which is different from Kafka. The concept of storage size on Pulsar dashboard is different from that of storage size (including replica size) on prometheus. Set the dbStorage_rocksDB_blockCacheSize large enough; when the message volume is large and there is a large accumulation of backlog, using the default size (256m) will take too much time to read, resulting in slower consumption. Use multi-partition to improve throughput. When the system is abnormal, actively grab stats and stats-internal, in which there are a lot of useful data. If the volume of a single Topic is too large in the business, it is recommended to set the backlogQuotaDefaultLimitGB to be large enough (the default is 10G) to avoid the occurrence of block producer due to the default use of producer_request_hold mode. Of course, you can choose the appropriate backlogQuotaDefaultRetentionPolicy according to the actual business. Actively choose backlog quota according to the actual business scenario. If the read time in prometheus is found to be empty, it may be because the cached data is read directly; when reading messages, Pulsar will first read write cache, and then read read cache;. If none of them are hit, it will read the entry bit in RocksDB and then read the entry from the log file. When writing messages, Pulsar will write journal and write cache;write cache synchronously and then write log files and RocksDB; asynchronously, so if resources are available, it is recommended that journal disk use SSD. VI. Summary

Now, the transformation scheme for priority middleware has been tried out in some existing network services, and we are still paying continuous attention to the stability of Pulsar.

As an open source project only in 2016, Pulsar has many attractive features and makes up for the shortcomings of other competitors, such as cross-regional replication, multi-tenancy, scalability, read-write isolation and so on. Although it is not widely used in the industry, Pulsar shows a tendency to replace Kafka in terms of existing features. We have also encountered some problems in the process of using Pulsar. I would like to thank Zhai Jia and Guo Sijie (both core engineers of Stream Native and PMC members of the open source project Apache Pulsar) for their support and help.

References:

[1] compete with Kafka, big data analyzes what is good about the rookie Pulsar (https://www.infoq.cn/article/1UaxFKWUhUKTY1t_5gPq))

[2] Open source real-time data processing system Pulsar: a set of Kafka+Flink+DB (https://juejin.im/post/5af414365188256717765441))

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.