How to compare the performance tests of Apache Kafka, Apache Pulsar and RabbitMQ 04/26 Update SLTechnology News&Howtos

How to compare the performance tests of Apache Kafka, Apache Pulsar and RabbitMQ

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how the comparison of Apache Kafka, Apache Pulsar and RabbitMQ performance testing is carried out. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Apache Kafka, Apache Pulsar, and RabbitMQ can all be used as messaging middleware platforms, and there are many comparable projects, but performance is usually the most important concern. In this article, we will focus on the throughput and latency of the system, as these are the main performance indicators of the event flow system in production. In particular, throughput testing can measure the efficiency of each system in using hardware, especially disk and CPU. Latency testing measures the proximity of each system to real-time messaging, including a tail delay of up to p99.9%, which is a core requirement for real-time and critical business systems and micro-service architectures.

From the test results, Kafka provides the best throughput while providing the lowest end-to-end latency up to p99.9%. At low throughput, the latency of RabbitMQ delivery messages is very low.

KafkaPulsarRabbitMQ (mirrored) peak throughput 605MB/s305MB/s38MB/sp99 latency (milliseconds) 5 milliseconds (200MB/s load) 25 milliseconds (200MB/s load) 1ms * (reduced to 30MB/s load)

Note: RabbitMQ latency increases significantly when throughput is higher than 30MB/s. In addition, the impact of mirroring is obvious at higher throughput, while using only traditional queues without mirroring can achieve better latency.

The structure of this article will first introduce the test framework used, and then introduce the test platform and workload. Finally, various systems and application indicators will be used to explain the results. All of this is open source, so curious developers can copy the results themselves or delve deeper into the collected Prometheus metrics. Like most benchmarks, performance testing is based on a specific set of workloads / configurations. Of course, developers can also use their own workload / configuration to compare to learn how to switch to a production environment.

Background

First, each system is briefly introduced to understand their high-level design and architecture, and to study the trade-offs made by each system.

Kafka is an open source distributed event streaming platform and is one of the five most active projects of the Apache Software Foundation. The core of Kafka is a replicable, distributed, persistent commit log to support event-driven micro-services or large-scale streaming applications. The client generates or consumes events directly to / from the agent cluster, which persists events to the underlying file system, and also automatically replicates events synchronously or asynchronously within the cluster for fault tolerance and high availability.

Pulsar is an open source distributed publish / subscribe messaging system originally designed for queue scenarios. Recently, it has also added event flow functionality. Pulsar is designed as a (almost) stateless proxy instance layer, and these instances connect to a separate BookKeeper instance layer, which is actually responsible for reading / writing and selectively persisting / copying messages. Pulsar is not unique in similar systems, such as Apache DistributedLog and Pravega, which are built on top of BookKeeper and can provide some Kafka-like event flow capabilities.

BookKeeper is an open source distributed storage service originally designed as a pre-written log for Hadoop NameNode. It provides persistent storage of messages as a ledger between server instances called bookies. Each bookie writes each message to the local log synchronously for recovery purposes and then asynchronously to its local index store. Unlike Kafka agents, bookies does not communicate with each other, while BookKeeper clients are responsible for copying messages between bookies using the arbitration protocol.

RabbitMQ is an open source traditional message middleware that implements the AMQP message standard and can meet the needs of low-latency queuing scenarios. The RabbitMQ consists of a set of agent processes that host the switches to which messages are published and the queues used to consume messages. Availability and persistence are properties of the various queue types provided. Traditional queues provide minimal availability guarantees. Traditional mirror queues replicate messages to other agents and improve availability. More persistence can be provided through the recently introduced queuing for arbitration, but it degrades performance. Because this article focuses on performance, the evaluation is limited to traditional queues and mirror queues.

Persistence of distributed system

Single-node storage systems, such as RDBMS, rely on fully synchronous writes to disks to ensure maximum persistence. But in distributed systems, persistence usually comes from replication, and multiple copies of data are independent of each other. Fully synchronized data is just one way to reduce the impact of a failure when it does occur (for example, synchronizing more frequently may shorten recovery time). Conversely, if enough copies fail, the distributed system may not be able to use it, whether fully synchronized or not. Therefore, whether or not to do full synchronization is only a question, that is, what each system chooses to rely on for replication design. Although some rely heavily on data written to disk, so each write needs to be synchronized, others deal with this situation in their design.

Kafka's replication protocol is carefully designed to ensure consistency and persistence by tracking content that has been synchronized to disk and content that is not synchronized to disk, without the need for a fully synchronous write disk. By making fewer assumptions, Kafka can handle a wider range of failures, such as file system-level corruption or accidental disk provisioning, and does not take the correctness of unknown data for granted. Kafka can also use the operating system to write bulk to disk to improve performance.

It is not clear whether BookKeeper provides the same consistency guarantee without fully synchronizing every write, especially if replication can be relied on for fault tolerance without synchronous disk persistence. According to the check and the fact that BookKeeper implements a fully synchronous algorithm for packets, it can be considered that it does rely on fully synchronous writes to ensure its correctness, but this conclusion needs to be further confirmed.

Since this can be a controversial topic, results are given in both cases during the test to ensure that it is as fair and complete as possible, although running Kafka with full synchronization is extremely rare and unnecessary.

Benchmark framework

For any test, people will focus on what framework is used and whether it is fair or not. To this end, OpenMessaging Benchmark Framework (OMB) was selected for this test, which was originally developed by Pulsar contributors. OMB is easy to get started with basic workload specifications, metrics collection / report summarization of test results, support for three selected messaging systems, and modular cloud deployment workflows customized for each system. However, it should be noted that there are some problems in the implementation of Kafka and RabbitMQ, which affect the fairness and repeatability of these tests, so some corrections have been made in this test, and the relevant code can be obtained here.

OMB framework adjustment

The test software platform has been upgraded to Java 11, Kafka 2.6, RabbitMQ 3.8.5 and Pulsar 2.6 (the latest version currently). Monitoring across three systems is significantly enhanced by using the Grafana/Prometheus monitoring stack across messaging systems, JVM, Linux, disk, CPU, and network capture metrics. This is essential to be able to not only report the results but also interpret them. This test adds support for producer-only testing and consumer-only testing, generates / removes backlogs, and fixes a serious error in calculating producer ratios when the number of topics is less than the number of producer working nodes.

OMB Kafka driver adjustment

This test fixes a critical Bug in the Kafka driver that prevents Kafka producers from using TCP connections and limits the bottleneck of a single connection for each working instance. Compared with other systems, this fix makes the number of Kafka reasonable, that is, they all now use the same number of TCP connections to communicate with their respective agents. Also fixed a serious Bug in the Kafka consumer driver where offsets were committed too frequently and synchronized resulting in performance degradation, while other systems performed asynchronously. The test also adjusted the Kafka consumer acquisition size and the number of replication threads to eliminate the bottleneck of message acquisition at high throughput and to configure the same agents as other systems.

OMB RabbitMQ driver adjustment

The test enhanced RabbitMQ to use primary key routing and configurable switching types (DIRECT and TOPIC switching), and fixed a Bug in the RabbitMQ cluster setup deployment workflow. Primary key routing is introduced to mimic the partitioning concept of each topic, which is equivalent to the settings on Kafka and Pulsar. TimeSync workflows have also been added to RabbitMQ deployments to synchronize time between client instances for accurate end-to-end latency measurement. Another Bug in the RabbitMQ driver has also been fixed to ensure accurate end-to-end delay measurement.

OMB Pulsar driver adjustment

For OMB Pulsar drivers, the test added the ability to specify the maximum processing size for Pulsar producers and turned off any global limits that might artificially limit the throughput of cross-partition producer queues at higher target rates, without requiring any other major changes to the Pulsar driver.

Test platform

OMB contains benchmark test platform definitions (instance type and JVM configuration) and load-driven configurations (producer / consumer configuration and server-side configuration), which will be used as the basis for testing. All tests deploy 4 working instances to drive the load, 3 proxy / server instances, 1 monitoring instance, and optionally 3 instance Apache ZooKeeper clusters for Kafka and Pulsar. After experimenting with several instance types, the test selected a network / storage optimized Amazon EC2 instance type with sufficient CPU kernel and network bandwidth to support the workload of disk Imando O binding. All the changes made to these baseline configurations and the procedures for different tests are described below.

Magnetic disk

Specifically, the test uses i3en.2xlarge (8 cores, 64 GB memory, 2x2500GB NVMe solid state disk) to achieve high network transmission bandwidth of 25Gbps to ensure that the test is not limited by the network. This means that the test measures the corresponding maximum server performance metrics, not just the speed of the network. The i3en.2xlarge instance supports write throughput of up to 655MB/s between two disks, which is enough to put pressure on the server. Based on the general recommendations and the original OMB settings, Pulsar uses one of the disks for log storage and the other for data storage without making any changes to the disk settings for Kafka and RabbitMQ.

Figure 1: establish the maximum disk bandwidth of the i3en.2xlarge instance on both disks and test it using Linux's dd command as a benchmark for throughput testing.

Disk 1dd if=/dev/zero of=/mnt/data-1/test bs=1M count=65536 oflag=direct65536+0 records in65536+0 records out68719476736 bytes (69 GB) copied, 210.278 s, 327 MB/s disk 2dd if=/dev/zero of=/mnt/data-2/test bs=1M count=65536 oflag=direct65536+0 records in65536+0 records out68719476736 bytes (69 GB) copied, 209.594 s, 328 MB/s operating system adjustment

In addition, for these three systems, the latency performance of the operating system is optimized using the tuned-adm profile, which disables any dynamic tuning mechanism of the disk and network scheduler, and uses the performance governor to adjust the CPU frequency. It marks the p state as the highest frequency for each kernel and sets the Imax O scheduler to the deadline to provide a predictable upper limit for disk request latency. Finally, the quality of service (QoS) of power management in the kernel is optimized to achieve better performance than energy saving.

Memory

Compared to the default instance in OMB, the i3en.2xlarge test instance has almost half of the physical memory (64GB versus 122GB). It is easy to adjust the compatibility of Kafka and RabbitMQ with test cases, both of which rely mainly on the operating system's page cache, which automatically shrinks as new instances are added.

However, both the Pulsar proxy and BookKeeper's bookies rely on out-of-heap / direct memory for caching, and the test adjusted the JVM heap / maximum direct memory for these two separate processes to work properly on the i3en.2xlarge instance. Specifically, you halve the heap size from each 24GB (in the original OMB configuration) to 12GB, allocating available physical memory proportionally between the two processes and the operating system.

Java.lang.OutOfMemoryError: Direct buffer memory was encountered in the test, and this error occurred directly at a high target throughput, which would cause bookies to crash completely if the heap size was smaller. This is a typical memory optimization problem faced by systems using out-of-heap memory. Although direct byte buffers can improve Java GC, significant optimization is still challenging.

Throughput test

The first thing to test is the peak stable throughput that each system can achieve with the same network, disk, CPU, and memory resources. Peak stable throughput is defined as the highest average producer throughput at which consumers can remain stable without increasing the backlog.

The effect of full synchronization

As mentioned earlier, the default recommended configuration for Apache Kafka is to use the page cache refresh policy specified by the underlying operating system to flush / fully synchronize messages to disk (rather than fully synchronizing each message) and rely on replication for persistence. Fundamentally, this provides a simple and effective way to amortize the costs of different batch sizes used by Kafka producers to achieve maximum possible throughput under all conditions. If Kafka is configured to be fully synchronized on each write, performance is artificially hindered by forcing a fully synchronous system call without any other benefit.

That is, given that the results of both cases are discussed, it is still necessary to understand the impact of full synchronization on each write in Kafka. The impact of various producer batch sizes on Kafka throughput is shown in the following figure. Before reaching the "optimal point", the throughput increases as the batch size increases, and at the "optimal point", the batch size is large enough that the underlying disk is fully loaded. Synchronizing each message to a disk on Kafka (the orange bar in figure 2) produces a larger batch of similar results. Please note that these results are only verified on the SSD of this test platform. Kafka does take full advantage of the underlying disks of all batch sizes, either maximizing IOPS at lower batch sizes or maximizing disk throughput at higher batch sizes, even when forcing full synchronization of each message.

Figure 2: effect of batch size on Kafka throughput (messages / s), with green bars turning off full synchronization (default) and orange bars indicating full synchronization enabled.

That is, it is clear from the above table that using the default synchronization setting (green bar) enables the Kafka agent to better manage page refreshes, providing better throughput overall. In particular, for lower producer batch sizes (1KB and 10KB), the throughput using the default synchronization setting is approximately 3-5 times higher than the full synchronization mode. However, for larger batches (100KB and 1MB), the cost of full synchronization is shared and the throughput is roughly the same as the default synchronization mode.

Pulsar implements similar batches on the producer and makes quota copies of produced messages across the bookies. BookKeeper's bookies implements packet commit / synchronization of disks at the application level to maximize disk throughput. By default, BookKeeper (controlled by bookie's configuration item journalSyncData=true) is written to disk in full synchronous mode.

To cover all scenarios, we also tested the configuration of journalSyncData=false on BookKeeper to compare it with Kafka. However, a large delay and instability are encountered on BookKeeper's bookies, which indicates the queue related to refresh. The test also validates the same behavior using the pulsar-perf tool that ships with Pulsar. It is understood that this is a Bug, so it will be excluded from the test later. However, since you can see that the disk throughput has reached its limit at the time of journalSyncData=true, it is considered that this will not affect the final result.

Figure journalSyncData=false performance description of 3:Pulsar and BookKeeper, showing throughput decline and latency peak

Figure 4:BookKeeper log callback queue growth and journalSyncData=false

RabbitMQ is bound to a persistence queue that saves messages to disk if and only if they have not been consumed. Unlike Kafka and Pulsar, RabbitMQ does not support reading old messages again. From a persistence perspective, tests show that consumers are synchronized with producers and therefore do not notice any writes to the disk. The test also enables RabbitMQ to provide the same availability assurance as Kafka and Pulsar by using mirror queues in a cluster of three agents.

Test configuration

The test is designed based on the following principles and expected guarantees:

Three copies of the message are copied to achieve fault tolerance (detailed below)

All three systems enable batch processing to optimize throughput, up to batch processing of 1MB data, up to 10ms

Pulsar and Kafka configure 100partitions on a topic

RabbitMQ does not support partitions in topics, and to match the settings of Kafka and Pulsar, a single direct exchange (equivalent to topics) and a link queue (equivalent to partitions) are declared, as detailed below.

OMB uses an automatic rate discovery algorithm, which dynamically calculates the throughput of the target producer by detecting the backlog at several rates. In many cases, you will see a sharp increase in the determination rate from 2. 0 messages per second to 500000 messages per second. These seriously damage the repeatability and fidelity of the test. Therefore, the target throughput was explicitly configured without using this feature in this test, and the target throughput was steadily increased in 10K, 50K, 100K, 200K, 500K and 1 million producer messages per second, of which 4 producers and 4 consumers used 1KB messages. Then observe the maximum rate at which each system provides stable end-to-end performance for different configurations.

Throughput result

From the results, you can see that Kafka provides the highest throughput of the three systems, and these results will be examined in more detail in each of the following systems.

Figure 5: comparison of peak stable throughput for all three systems: 100topic partitions and 1KB messages, using 4 producers and 4 consumers

Kafka is configured to use batch.size=1MB as well as linger.ms=10 for producers to write agents in bulk efficiently. In addition, the producer configures acks=all and min.insync.replicas=2 to ensure that each message is copied to at least two agents before confirming to the producer. The test found that Kafka can make full use of the 2 disks on each agent, which is the ideal result of the storage system. For more information, see Test-driven configuration for Kafka.

Figure 6: Kafka performance using the default recommended full synchronization settings. The figure shows the Icano utilization on the Kafka agent and the corresponding producer / consumer throughput (source: Prometheus node metric). For more information, see the original results.

In addition, Kafka is benchmarked with another configuration, even though flush.messages=1 and flush.ms=0 are used to synchronize all messages to disks on all replicas before confirming writes. The result is shown in the following figure, which is very close to the default configuration:

The 7:Prometheus node metrics in the figure show the Imax O utilization on the Kafka agent and the corresponding producer / consumer throughput. For more information, see the original results.

The producers of Pulsar are different from Kafka in queuing to generate requests. Specifically, it has a corresponding producer queue for each partition internally, as well as limits on the size of these queues, which set an upper limit on the number of messages from all partitions from a given producer. To prevent the Pulsar producer from limiting the number of messages sent, the test sets each partition and global limit to infinity, matching the batch limit based on 1MB bytes.

.batchingMaxBytes (1048576) / / 1MB.batchingMaxMessages (Integer.MAX_VALUE) .maxPendingMessagesAcrossPartitions (Integer.MAX_VALUE)

For Pulsar, a higher time-based batch limit, batchingMaxPublishDelayMs=50, is also set to ensure that the batch is primarily based on the byte limit. The test obtains this value by increasing it until it has no measurable effect on the peak stable throughput that Pulsar finally achieves. For replicated configurations, the test uses ensembleSize=3,writeQuorum=3,ackQuorum=2, which is equivalent to the configuration of Kafka. For more information, see Test-driven configuration for Pulsar.

Through BookKeeper's design, that is, bookies writes data to local logs and books, you can see that peak stable throughput is actually half that of Kafka. This basic design choice can have a profound negative impact on throughput, which directly affects costs. Once the log disk on BookKeeper's bookies is fully saturated, Pulsar productivity will be limited.

Figure 8:Prometheus node metrics show that Pulsar's BookKeeper log disk is used up, and the resulting throughput is measured in BookKeeper's bookies. For more information, see the original results.

To further verify this, the test also configured BookKeeper to use two disks in RAID0, which gave BookKeeper the opportunity to write logs and books on two disks, respectively. At this point, you will see that Pulsar maximizes the combined throughput of the disk (~ 650MB/s), but it is still limited to ~ 340MB/s of the peak stable throughput.

Figure 9:Prometheus node metrics show that the RAID0 configuration causes the BookKeeper log disk to remain exhausted

Figure 10:Prometheus node metrics show that the RAID0 disk has been used up, and the resulting throughput is measured at the Pulsar agent. For more information, see the original results.

Pulsar has a hierarchical architecture that separates the bookies (storage) of BookKeeper from the Pulsar proxy (cache / proxy for storage). For completeness, the test also ran throughput tests in the hierarchical deployment above, which moved the Pulsar agent to the other three computationally optimized instances c5n.2xlarge (with 8 cores, 21GB memory, highest 25Gbps network bandwidth, EBS-based storage), while the BookKeeper node remained on the storage-optimized i3en.2xlarge instance. Under this special setting, a total of 6 instances / resources are provided for Pulsar and BookKeeper, which provides twice as much CPU resources and 33% additional memory than Kafka and RabbitMQ.

Even in the case of high throughput, the system is mostly constrained by Istroke O, and the test did not find any improvement in this setting. See the following table for the complete results of this particular scene. In fact, Pulsar's 2-tier architecture does not seem to have any real CPU bottlenecks, only adding more overhead, that is, 2 JVM takes up more memory, network traffic is tripled, and there are more components in the system architecture. Therefore, it can be considered that when the network is limited (unlike the excessive network bandwidth provided by this test), the 2-tier architecture of Pulsar will consume network resources twice as fast, resulting in performance degradation.

Pulsar deployment model Peak producer Throughput (MB/s) tiered 305.73 coexistence 305.69

Unlike Kafka and Pulsar, RabbitMQ has no concept of partitioning in topics. Instead, RabbitMQ uses switches to route messages to linked queues, using header attributes (header switching), routing keys (direct and topic switching), or bindings (fan out switching) from which consumers can process messages. To match the settings of the workload, this test declares a direct exchange (equivalent to topics) and a link queue (equivalent to partitions), each dedicated to service-specific routing keys. The test allows the producer to generate messages for all routing keys (polling) and consumers corresponding to each queue. This test also optimizes RabbitMQ using best practices recommended by the community:

Enable replication (queues are replicated to all nodes in the cluster)

Disable message persistence (queues are only in memory)

Automatic consumer confirmation is enabled

Cross-agent load balancing queue

RabbitMQ uses a dedicated kernel for each queue, so there are 24 queues (8 vCPU x 3 agents).

RabbitMQ performs poorly in terms of replication overhead, which seriously reduces the throughput of the system. The test notes that during this workload, all nodes are constrained by CPU (see the y-axis green line in the figure below), leaving little room for brokering any other messages. For more information, see RabbitMQ Test driver configuration.

Figure 11:RabbitMQ throughput + CPU utilization. For more information, see the original results.

Delay test

In view of the increasing popularity of streaming and event-driven architectures, another key to messaging systems is the end-to-end latency of messages from producer to system to consumer, so this test compares this with the highest stable throughput that can be maintained on each of the three systems without overloading.

To optimize latency, this test changed the producer configuration on all systems to batch messages up to 1 millisecond (instead of 10 milliseconds for throughput testing), and also left each system with the default recommended configuration while ensuring availability. Kafka is configured to use its default full synchronization setting (that is, to turn off full synchronization), and RabbitMQ is configured to mirror the queue without saving messages. Based on repeated runs, this test chooses to compare Kafka and Pulsar at the speed of 200K messages per second or 200MB/s, which is lower than the single disk throughput limit of 300MB/s on this test platform. Finally, you can see that RabbitMQ will face CPU bottleneck when the throughput exceeds 30K messages per second.

Delay test results

Figure 12: end-to-end latency in standard mode of high availability configuration, measured as 200K messages per second (1KB message size) on Kafka and Pulsar, and only 30K messages per second on RabbitMQ because it cannot maintain a higher load. Note: the lower the ms, the better.

Kafka always maintains a lower latency than Pulsar, and RabbitMQ achieves the lowest latency of the three systems, but the throughput is much lower due to limited vertical scalability. Because the test is intentionally set up, the consumer of each system can always be synchronized with the producer, and almost all reads are obtained from the cache / memory of the three systems.

Most of the performance of Kafka can be attributed to a fully optimized consumer read implementation, which is based on efficient data organization without any additional overhead. Kafka makes full use of Linux page caching and zero replication mechanism to avoid copying data into user space. In general, many systems, such as databases, have established application-level caches, giving them more flexibility to support random read / write loads. However, for messaging systems, relying on page caching is a good choice, because a typical load performs sequential read / write operations. The Linux kernel has been optimized for years to intelligently detect these modes and to use techniques such as pre-reading to greatly improve read performance. Similarly, building on page caching allows Kafka to use sendfile-based network transmission, avoiding other copies of data. To be consistent with the throughput test, the full synchronization mode of Kafka was also tested.

Pulsar uses different caching methods than Kafka, some of which are derived from the core design in BookKeeper to distinguish between log and ledger storage. In addition to the Linux page cache, Pulsar also uses a multi-tier cache, that is, the read-ahead cache on BookKeeper's bookies (the default value of OMB is dbStorage_readAheadCacheMaxSizeMb=1024 in the test), and the managed account book (managedLedgerCacheSizeMB, in this test, 12% of the direct memory available in 12GB is 2.4GB). No benefit of this multi-tier cache was found in the test, and multiple caches may increase the overall cost of deployment, and it is suspected that there will be a considerable fill in the non-heap usage of 12GB to avoid Java GC problems caused by the direct byte buffers mentioned earlier.

The performance of RabbitMQ is affected by producer-side exchanges and consumer-side queues bound by these exchanges. The test uses the mirror settings in the throughput test for latency testing, especially direct switching and mirrored queues. Due to CPU bottlenecks the test was unable to increase throughput to more than 38K messages per second and any attempt to measure latency at this rate showed significant performance degradation almost a 2-second p99 delay.

After gradually reducing the throughput from 38K messages per second to 30K messages per second, a stable throughput can be achieved, and the system does not seem to be overutilized, which confirms that the p99 latency of 1ms is significantly better. It can be considered that the overhead of replicating 24 queues on three nodes seems to have a significant negative impact on end-to-end latency at higher throughput, while RabbitMQ with throughput less than 30K messages per second or 30MB/s (magenta solid line) can provide much lower end-to-end latency than the other two systems.

Overall, following best practices allows RabbitMQ to provide limited latency. Given that the latency test is deliberately set so that consumers can always keep up with producers, the efficiency of the RabbitMQ messaging pipeline boils down to the number of context switches that Erlang BEAM VM (and CPU) need to do to handle queues. Therefore, limiting it by assigning a queue to each CPU kernel can provide the lowest latency. In addition, complex routes (similar to consumers dedicated to partitions on Kafka and Pulsar) can be routed to specific queues using direct or topic switching. However, because there is no wildcard matching, direct exchange provides better performance, which adds more overhead, which is a suitable choice for this test.

Figure end-to-end latency for 13:Kafka, Pulsar, and RabbitMQ, tested at 200K messages per second (1KB message size) on Kafka and Pulsar and 30K messages per second on RabbitMQ. For more information, see the original results (Kafka,Pulsar and RabbitMQ). Note: ms: the lower the better.

At the beginning of this article, we have introduced the delay results (solid green lines) of Kafka using the recommended default full synchronization configuration. In another configuration where Kafka writes each message to disk in full synchronous mode (green dot dotted line), the test found that the latency of Kafka was lower than that of Pulsar until p99.9%, while Pulsar (blue line) performed better on the high tail percentage. Although it is difficult to accurately infer the tail delay in the percentage of p99.9% or higher, it can be assumed that for the alternative Kafka fully synchronous configuration (green dot dotted line), the nonlinear delay will increase sharply in the percentage of p99.9%.

Trade-off of delay

Figure 14:RabbitMQ end-to-end latency: mirrored queues (the configuration used in the test) and traditional queues (no replication) have a rate of 10K, 20K, 30K and 40K messages per second. Note: the proportion of the y-axis in this chart is logarithmic.

Admittedly, every system needs to be weighed in design. Although unfair to Kafka and Pulsar, it is found to be interesting compared to RabbitMQ when high availability is not available. RabbitMQ has lower transaction latency than Kafka and Pulsar, provides stronger persistence guarantees, and is three times more available than RabbitMQ. This may be related to scenarios (for example, device location tracking) where it is acceptable to sacrifice availability for better performance, especially if the scenario requires real-time messaging and is not sensitive to usability issues. Tests show that when replication is disabled, RabbitMQ can maintain lower latency at higher throughput, although the improved throughput (100K messages / s) is still significantly lower than that achieved by Kafka and Pulsar.

Even if Kafka and Pulsar are slow (~ 5ms and ~ 25ms, respectively on P99), their persistence, higher throughput, and higher availability are important for handling large-scale event flow scenarios such as financial transactions. For scenarios that require low latency, RabbitMQ can achieve p99~1ms latency only under light load, because messages are only queued in memory without replication overhead.

In practice, operators need to configure RabbitMQ carefully to keep the rate low enough to maintain these low latency limits so that latency can be significantly reduced. Tasks are difficult, or even impossible, to be implemented in a common way in all scenarios. Overall, a better architectural choice with lower operation and maintenance overhead and cost may be to choose a persistence system such as Kafka for all scenarios, which provides the best throughput at all load levels with low latency.

In this paper, three message systems (Kafka, RabbitMQ and Pulsar) are analyzed comprehensively and balanced, and the following conclusions are drawn:

Throughput: Kafka provides the highest throughput of any system, writing 15 times faster than RabbitMQ and 2 times faster than Pulsar.

Latency: Kafka provides the lowest latency at high throughput, while also providing strong persistence and high availability. In all latency tests, the default configuration of Kafka is faster than Pulsar, and when fully synchronized on each message, its speed can reach up to p99.9. RabbitMQ can achieve lower end-to-end latency than Kafka, but only if the throughput is significantly reduced.

Cost / complexity: cost is often an inverse function of performance. As the system with the highest stable throughput, Kafka provides the best value of all systems (that is, the cost of writing per byte) because of its efficient design. In addition, the removal of ZooKeeper from Kafka (see KIP-500) is well under way and will further simplify the architecture of Kafka.

This is the end of how the performance tests of Apache Kafka, Apache Pulsar and RabbitMQ are compared. I hope the above can be helpful to you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.