What are the reasons for Apache Pulsar rather than Kafka 04/07 Update SLTechnology News&Howtos

What are the reasons for Apache Pulsar rather than Kafka

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the reasons for Apache Pulsar rather than Kafka". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the reasons for Apache Pulsar instead of Kafka".

About Apache Pulsar

Apache Pulsar, a top-level project of the Apache Software Foundation, is a next-generation cloud native distributed message flow platform that integrates message, storage and lightweight functional computing. It adopts a separate computing and storage architecture, supports multi-tenant, persistent storage, and multi-room cross-regional data replication, and has streaming data storage features such as strong consistency, high throughput, low latency and high scalability.

Apache Pulsar has many unique advantages, such as tiered storage, stateless broker, cross-regional replication, multi-tenancy, and so on, which give Pulsar an advantage over Kafka.

Stateless broker (easy to extend)

When using Kafka, you need to set the number of broker first. Because Kafka stores data in broker, once you find that the setting is too small and you need more broker to extend the application, you need to make full use of the new partition and you must repartition the topic.

Pulsar saves the state of broker in a separate layer (Apache BookKeeper [1]). The Broker tier is decoupled from the storage tier to add or use broker without moving data. That is, you can take full advantage of the new broker without repartitioning existing data.

Hierarchical storage (persistent storage of messages, reduced storage costs)

The default data retention time for Kafka is 7 days, that is, the data will be deleted after a week. By default, Pulsar retains all data that is not ack and immediately deletes the data that has been ack.

Both Kafka and Pulsar support modifying the data retention period through a custom retention policy. However, the amount of data that can be stored in primary storage is limited, and increasing the amount of data will also increase storage costs. Hierarchical storage supports cost-saving and appropriate storage for different types of data. For example, historical data is used only in boot (backfill) applications, so you can choose different storage types for historical data.

The storage layer of Pulsar adopts the slicing architecture, which is distributed on the storage nodes. With Pulsar, you can either write shards to primary storage or uninstall shards to other types of storage. Therefore, Pulsar supports tiered storage, but Kafka does not currently support tiered storage. Tiered storage provides multiple storage tiers, such as primary storage (SSD-based), historical storage (S3), etc., so you can easily access the storage situation of each tier.

Quorum-based replication (improved latency consistency)

Pulsar uses quorum-based algorithm for replication, while Kafka uses leader-follower-based algorithm. Although the guarantees of Pulsar and Kafka are the same, the latency generated by the quorum-based approach is more consistent.

Cross-region replication (highly available)

Pulsar natively supports cross-regional replication, so Pulsar can replicate data across data centers in different geographic locations. When the data center is interrupted or the network is partitioned, it is particularly important to have copies of messages in multiple data centers to improve availability.

Multi-tenancy (simplified architecture and management)

Pulsar supports multi-tenancy, where multiple user groups share the same cluster through access control or in completely different namespaces.

Information encryption (improve security)

Pulsar provides full end-to-end encryption from the client to the storage node. Complete encryption is generally a requirement for data security. Kafka currently does not support end-to-end encryption.

Multiprotocol support (easy to integrate with existing applications)

Pulsar not only supports a variety of protocols (such as RabbitMQ, AMQP, Kafka), but also supports parallel reading of historical stream events using Presto [2].

Pulsar Functions (one-stop streaming)

Pulsar Functions is a lightweight stream processing method based on Pulsar, and its concept is similar to Kafka Streams. Pulsar Functions is deployed directly on the broker node (or as a container in the Kubernetes cluster), while Kafka Streams is a separate application. Through Pulsar Functions,Pulsar, you can directly solve many flow processing tasks and simplify the operation.

Apache Flink integration (batch and streaming)

Pulsar has been tested in practice (used in large-scale production environment)

Pulsar has many advantages in design. It was originally developed by the Yahoo team and used inside Yahoo. In 2016, Yahoo donated Pulsar to Apache Software Foundation [4]. Since then, many mission-critical applications have adopted Pulsar, such as Tencent, Splunk [5] and so on [6].

Pulsar is not perfect.

Pulsar requires two systems: Apache BookKeeper and Apache ZooKeeper, while Kafka "only" needs ZooKeeper. Multiple systems increase operational complexity, but it is also because of the adoption of multiple systems that Pulsar is more flexible. Since both Kafka and Pulsar use other systems, both need to be set up and maintained.

It is not easy to choose between Pulsar and Kafka, and this decision will have a series of implications. I have summarized the main differences between Pulsar and Kafka in this article, and I hope this information will help you and your team make a choice. For more information about Apache Pulsar, visit pulsar.apache.org or subscribe to email notifications [7]. If you want to get in touch with your friends in the community, you can scan the QR code at the end of the article to join the Wechat communication group.

Thank you for your reading, the above is the content of "what are the reasons for Apache Pulsar rather than Kafka". After the study of this article, I believe you have a deeper understanding of the reasons for Apache Pulsar rather than Kafka, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.