How to compare and analyze the performance of Apache Pulsar and Apache Kafka in financial scenarios 07/01 Update SLTechnology News&Howtos

How to compare and analyze the performance of Apache Pulsar and Apache Kafka in financial scenarios

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to make a comparative analysis of the performance of Apache Pulsar and Apache Kafka in financial scenarios? in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Background

Apache Pulsar is the next generation distributed message flow platform, which adopts hierarchical architecture of computing and storage, and has many advantages, such as multi-tenant, high consistency, high performance, millions of topic, smooth data migration and so on. More and more enterprises are using Pulsar or trying to apply Pulsar to production environments.

Tencent uses Pulsar as the message bus of its billing system to support hundreds of billions of online transactions. Tencent has a huge billing volume, and the core problem to be solved is to ensure the consistency of money and goods. First of all, to ensure that there are no wrong accounts in each payment transaction, so as to achieve high consistency and high reliability. Secondly, ensure that all services carried by billing are available for 724 hours, so as to achieve high availability and high performance. The billing message bus must have these capabilities.

Analysis of Pulsar Architecture

In terms of consistency, Pulsar uses Quorum algorithm to ensure the number of copies and strongly consistent written responses of distributed message queues through write quorum and ack quorum (A > Wacer 2). In terms of performance, Pulsar uses Pipeline to produce messages, reduces disk IO pressure through sequential writes and striped writes, and reduces network requests to speed up consumption efficiency.

The high performance of Pulsar is mainly reflected in network model, communication protocol, queue model, disk IO and striped write. I will explain them one by one in detail.

Network model

Pulsar Broker is a typical Reactor model, which mainly consists of a network thread pool, which is responsible for processing network requests, sending and receiving, coding and decoding, and then pushing the requests to the core thread pool through the request queue for processing. First of all, Pulsar adopts multi-thread mode and makes full use of the multi-core advantages of modern systems to assign the same task request to the same thread to avoid the overhead caused by switching between threads as much as possible. Secondly, Pulsar adopts queue mode to realize the asynchronous decoupling of the network processing module and the core processing module, and realizes the parallel processing of the network processing and the file Imax O, which greatly improves the efficiency of the whole system.

Communication protocol

The message (message) is binary encoded and the format is simple; the binary data generated by the client is directly sent to the Pulsar backend broker,broker without decoding and sent directly to the bookie storage, and the storage format is also binary, so there is no encoding and decoding operation in the message production and consumption process. Message compression and batch sending are done on the client side, which can further improve the ability of broker to process messages.

Queue model

Pulsar partition the topic (topic) and assigns different partitions to different Broker as far as possible to achieve horizontal scale. Pulsar supports online adjustment of the number of partitions and theoretically supports unlimited throughput. Although the capacity and performance of ZooKeeper will affect the number of broker and the number of partitions, the upper limit is so large that there can be no upper limit.

Disk IO

Message queuing is a disk IO-intensive system, so optimizing disk IO is critical. Disk-related operations in Pulsar are mainly divided into two categories: operation log and data log. The operation log is used for data recovery. A full sequential write mode is used. A successful write can be considered a success in production, so Pulsar can support millions of topics without sharp performance degradation caused by random writes.

The operation log can also be out of order, so that the operation log can be written at the best write rate, and the data log will be sorted and deduplicated. Although write magnification occurs, the benefit is worth it: by hanging the operation log and the data log on different disks, separating the read and write IO, further improving the IO-related processing power of the entire system.

Striped write

Striped writes can take advantage of more bookie nodes for IO sharing; Bookie sets write cache and read cache. The latest messages are placed in the write cache, and other messages are read in batches from the file and added to the read cache to improve read efficiency.

From an architectural point of view, Pulsar has no obvious sticking points in each process of processing messages. Operation log persistence has only one thread responsible for flushing, which may cause stutters. According to the disk characteristics, you can set up multiple disks, multiple directories, improve disk read and write performance, which can fully meet our needs.

test

In the Tencent billing scenario, we set the same scenario to compare Pulsar and Kafka respectively. The specific test scenarios are as follows.

The pressure test data are as follows:

As can be seen from the above data, in the case of network IO, when there are three replicas with multiple partitions, the Pulsar almost runs the broker Nic out of traffic, because a copy of data needs to be distributed on the broker side three times, which is the cost of computing storage separation.

The performance data of Kafka is a bit disappointing, and the overall performance has not improved, which should be related to the replica synchronization mechanism of Kafka itself: Kafka uses the strategy of follow synchronous pull, resulting in low overall efficiency.

In terms of latency, Pulsar performs better on the production side, with 99% of the total time consuming less than 10 milliseconds when the resource does not reach the bottleneck, fluctuating in garbage collection (Garbage Collection,GC) and creating operation log files.

From the pressure test results, in a highly consistent scenario, the performance of Pulsar is better than that of Kafka. If log.flush.interval.messages=1 is set, the performance of Kafka is even worse. Kafka is designed for high throughput at the beginning, and there are no parameters such as direct synchronous flushing.

In addition, we tested other scenarios, such as millions of Topic and cross-region replication. In the production and consumption scenarios of millions of Topic scenarios, Pulsar did not experience a sharp decline in performance due to the increase in the number of Topic, while Kafka slowed down rapidly because of a large number of random writes.

Pulsar natively supports cross-region replication and supports both synchronous and asynchronous modes. In intra-city cross-region replication, the throughput of Kafka is not high and the replication speed is very slow, so in the cross-region replication scenario, we tested Pulsar synchronous replication. The storage cluster is deployed across cities, and multiple responses must be included when waiting for ACK. The relevant parameters used in the test are the same as those in the same city. The test results show that the throughput of Pulsar can reach 280000 QPS in the case of cross-city. Of course, the performance of cross-city and cross-region replication depends largely on the quality of the current network.

Usability analysis

As a new distributed message flow platform, Pulsar has many advantages. Thanks to the sharding processing of bookie and ledger's strategy of selecting storage nodes, the operation and maintenance Pulsar is very simple and can get rid of the manual data balance annoyance like Kafka. But Pulsar is not perfect, there are some problems of its own, and the community is still improving.

Pulsar's strong dependence on ZooKeeper

Pulsar is highly dependent on ZooKeeper. In the extreme case, the outage or blockage of the ZooKeeper cluster will lead to the downtime of the entire service. The probability of ZooKeeper cluster collapse is relatively small. After all, ZooKeeper has been tested by a large number of online systems, and it is still relatively widely used. However, the probability of ZooKeeper congestion is relatively high. For example, in a million Topic scenario, millions of ledger metadata information will be generated, all of which need to interact with ZooKeeper.

For example, to create a topic (topic), you need to create topic partition metadata, a Topic name, and a Topic storage ledger node; while to create a ledger, you need to create and delete unique ledgerid and ledger metadata inodes, which requires a total of five ZooKeeper writes and four similar ZooKeeper writes for one subscription, so a total of nine writes are required. If you focus on creating millions of topics at the same time, it is bound to put a lot of pressure on ZooKeeper.

Pulsar has the ability to disperse ZooKeeper deployment, which can relieve the pressure of ZooKeeper to a certain extent, and the ZooKeeper cluster zookeeperServer is the most dependent. From the previous analysis, the write operation is relatively controllable, and the Topic can be created through the console. Bookie depends on the ZooKeeper that operates most frequently, and if the ZooKeeper is blocked, the current write will not be affected.

You can optimize your dependence on zookeeperServerzk in the same way. At least for the current service can last for a period of time, give ZooKeeper enough time to recover; second, reduce the number of ZooKeeper writes, only for necessary operations, such as broker election. Like broker's load information, you can look for other storage media, especially when a broker services a large number of topics, this information will reach the megabyte (M) level. We are working with the Pulsar community to optimize broker load capabilities.

Pulsar memory management is a little more complicated

The memory of Pulsar consists of JVM heap memory and heap external memory. Message sending and caching are stored through out-of-heap memory to reduce garbage collection (GC) caused by IO. Heap memory mainly caches ZooKeeper-related data, such as ledger metadata information and subscriber repush message ID cache information. Through dump memory analysis, it is found that a ledger metadata information takes about 10K, and a subscriber repush message ID cache starts at 16K and will continue to grow. When the memory of the broker continues to grow, the overall garbage collection (full GC) is performed frequently until the exit.

To solve this problem, first find fields that can reduce memory footprint, such as the bookie address information in the ledger metadata information. Each ledger creates objects, but bookie nodes are very limited, so you can use global variables to reduce the creation of unnecessary objects. Subscribers can push the message ID cache to control initialization within 1K and scale down regularly. These operations can greatly improve the availability of Broker.

Compared with Kafka, Pulsar broker has many advantages. Pulsar can automatically balance the load, which will not lead to service instability because of the high load of a certain broker. It can quickly expand the capacity and reduce the load of the entire cluster.

This is the answer to the question on how to compare and analyze the performance of Apache Pulsar and Apache Kafka in the financial scenario. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.