Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of supporting Native Kafka Protocol on Apache Pulsar

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

The purpose of this article is to share with you an example analysis of supporting native Kafka protocols on Apache Pulsar. The editor thinks it is very practical, so I hope you can learn something after reading this article. Let's take a look at it with the editor.

We are pleased to announce that StreamNative and OVHcloud have opened up KoP (Kafka on Pulsar). KoP introduced the Kafka protocol processing plug-in to Pulsar broker. In this way, Apache Pulsar supports the native Apache Kafka protocol.

After adding the KoP protocol processing plug-in to an existing Pulsar cluster, users can migrate existing Kafka applications and services to Pulsar without modifying the code.

This allows Kafka applications to take advantage of the powerful features of Pulsar, such as:

Simplify operations with enterprise-class multi-tenancy

Avoid data relocation and simplify operation

Using Apache BookKeeper and hierarchical storage to persist the event stream

Using Pulsar Functions to handle serverless events

What is Apache Pulsar?

Apache Pulsar is an event streaming platform. Initially, Apache Pulsar adopted a cloud-native, hierarchical sharding architecture. The architecture separates the service from the storage, which makes the system more friendly.

Pulsar's cloud native architecture is highly scalable, consistent, and resilient, enabling companies to expand their business through real-time data solutions. Pulsar has been widely adopted since open source in 2016 and became a top-level Apache project in 2018.

The desire for KoP

Plusar provides a unified messaging model for queue and flow workloads. Pulsar supports its own protobuf-based binary protocol to ensure high performance and low latency. Protobuf facilitates the implementation of Pulsar clients.

Moreover, the project also supports Java,Go,Python and C + + languages as well as third-party clients provided by the community. However, for applications written using other messaging protocols, users must rewrite them, or they cannot adopt Pulsar's new Unified messaging protocol. To solve this problem, the Pulsar community has developed applications to migrate Kafka applications from other messaging systems to Pulsar. For example, Pulsar provides Kafka wrapper on Kafka Java API.

Kafka wrapper allows users to switch the Kafka Java client application they use from Kafka to Pulsar without changing the code. Pulsar also provides a rich connector ecosystem for connecting Pulsar and other data systems.

However, there is still a strong demand for users who want to switch from other Kafka applications to Pulsar.

StreamNative and OVHcloud work together

StreamNative receives a large number of inbound requests for help in migrating from other messaging systems to Pulsar. At the same time, StreamNative is aware of the need to natively support other messaging protocols such as AMQP and Kafka on Pulsar.

As a result, StreamNative began to work on introducing the generic protocol processing plug-in framework into Pulsar. The framework allows developers who use other messaging protocols to use Pulsar.

OVHcloud has adopted Apache Kafka for many years. Despite their experience running multiple clusters on Kafka and processing millions of messages per second, they still face daunting operational challenges. For example, without using the multi-tenancy feature, it is difficult for them to put thousands of Topic of thousands of users in a cluster.

So OVHcloud abandoned Kafka and decided to move its theme-as-a-service product (that is, ioStream) to Pulsar and build its product on Pulsar. Compared to Kafka, Pulsar supports multi-tenancy and its overall architecture includes Apache BookKeep components, which helps simplify user operations. After the preliminary experiment, OVHcloud decided to use KoP as PoC proxy and immediately convert the Kafka protocol to Pulsar. In the process, OVHcloud notes that StreamNative is working to introduce the Kafka protocol natively to Pulsar. So they worked together to develop KoP.

KoP aims to provide a concise and comprehensive solution by leveraging the event streaming storage architecture of Pulsar and BookKeeper and the pluggable protocol processing plug-in framework of Pulsar. KoP is a protocol processing plug-in named "kafka". KoP is bound to Pulsar broker and runs with Pulsar broker.

Distributed log

With regard to logging, Pulsar and Kafka both use a very similar data model for publishing / subscribing to messages and event streams. For example, both Pulsar and Kafka use distributed logging.

The main difference between the two systems is how they implement distributed logging. Kafka uses a partitioned architecture to store distributed logs (logs in Kafka partitions) in a set of broker. Pulsar uses the sharding architecture, uses Apache BookKeeper as its scale-out sharding storage layer, and stores distributed logs in Apache BookKeeper.

Pulsar's shard-based architecture helps avoid data migration, achieve high scalability, and store event streams persistently.

Both Pulsar and Kafka are built on similar data models (distributed logs), and Pulsar uses distributed log storage and pluggable protocol processing plug-in framework (introduced in version 2.5.0), so Pulsar can easily implement Kafka-compatible protocol processing plug-ins.

Mode of realization

By comparing Pulsar and Kafka, we find that there are many similarities between the two systems. Both systems include the following operations:

Topic lookup: all clients connect to any broker to find the Topic's metadata (that is, owner broker). After the metadata is obtained, the client establishes a persistent TCP connection with the owner broker. Publish: the client talks to the owner broker in the Topic zone to append the message to the distributed log. Consumption: the client talks to the owner broker of the Topic partition to read messages from the distributed log. Offset: assigns an offset to messages published to the Topic partition. In Pulsar, the offset is called MessageId. Consumer can use offsets to find a given location in the log in order to read messages. Consumption status: both systems maintain the consumption status of the consumer in the subscription (what Kafka calls a consumer group). Kafka stores consumption status in `_ _ offsets` Topic, while Pulsar stores consumption status in `cursors`.

As you can see, these are all the original operations provided by scale-out distributed log storage (such as Apache BookKeeper).

The core function of Pulsar is implemented on Apache BookKeeper. Therefore, we can implement the Kafka concept very simply and directly using the existing components developed by Pulsar on BookKeeper.

The following figure illustrates how we add Kafka protocol support to Pulsar. We introduce a new protocol processing plug-in, which uses the existing components of Pulsar (such as Topic discovery, distributed log library-ManagedLedger, cursor, etc.) to implement the Kafka transport protocol.

Topic

Kafka stores all Topic in a flat namespace. However, Pulsar stores Topic in a hierarchical, multi-tenant namespace. We added the `kafkaNamespace` configuration to the broker configuration so that the administrator can map Kafka Topic to Pulsar Topic.

To make it easier for Kafka users to use the multi-tenancy feature of Apache Pulsar, when Kafka users use the SASL authentication mechanism to authenticate Kafka clients, they can specify a Pulsar tenant and namespace as their SASL user name.

Message ID and offset

Kafka specifies an offset for each message that is successfully published to the Topic partition. Pulsar specifies an `MessageID` for each message. The message ID consists of `ledger- id`, `batch- id` and `batch- index`. We use the same method in Pulsar-Kafka wrapper to convert Pulsar's message ID to offset, and vice versa. Messages Kafka and Pulsar both contain keys, values, timestamps, and header (called 'properties'' in Pulsar). We automatically convert these fields between Kafka messages and Pulsar messages.

Topic search

We provide the same Topic lookup method for request processing plug-ins for Kafka and Pulsar. The request processing plug-in discovers the Topic, finds the full ownership of the requested Topic partition, and then returns the Kafka `TopicMetadata` containing ownership information to the Kafka client.

Publish a message

When a message is received from the Kafka client, the Kafka request processing plug-in maps multiple fields (such as key, value, timestamp, and headers) one by one, thereby transforming the Kafka message into an Pulsar message.

At the same time, the Kafka request processing plug-in uses ManagedLedger append API to store these converted Pulsar messages in BookKeeper. After the Kafka request processing plug-in converts Kafka messages into Pulsar messages, existing Pulsar applications can receive messages published by Kafka clients.

Consumption message

When a consumer request is received from a Kafka client, the Kafka request processing plug-in opens a non-persistent cursor and then reads the entries from the offset of the request.

After the Kafka request processing plug-in converts the Pulsar message back to the Kafka message, the existing Kafka application can receive the message published by the Pulsar client.

Group coordinator & offset Management

The biggest challenge is to implement group coordinator and offset management. Pulsar does not support centralized group coordinator, cannot assign partitions to consumer in consumer groups, and cannot manage offsets for each consumer group.

Pulsar broker manages partition assignments based on partitions, while partition owner broker manages offsets by storing confirmation information in cursors. It is difficult to make the Pulsar model consistent with the Kafka model. Therefore, in order to be fully compatible with the Kafka client, we store the changes and offsets of coordinator group in the Pulsar named `public/kafka/__ offsets` system Topic, thus implementing Kafka coordinator group.

In this way, we can build a bridge between Pulsar and Kafka and allow users to use existing Pulsar tools and policies to manage subscriptions and monitor Kafka consumer. We add a background thread to the implemented coordinator group to periodically synchronize offset updates from the system Topic to the Pulsar cursor.

So, in fact, the Kafka consumer group is considered a Pulsar subscription. All existing Pulsar tools can also be used to manage Kafka consumer groups. Connect two popular messaging ecosystems

Both StreamNative and OVHcloud value customer success. We believe that providing native Kafka protocols on Apache Pulsar can help users who adopt Pulsar achieve business success faster. KoP integrates two popular event flow ecosystems and unlocks new use cases. Customers can take advantage of these two ecosystems to build a truly unified event flow platform with Apache Pulsar to accelerate the development of real-time applications and services. KoP enables the log collector to continue to collect log data from its source and publish messages to Apache Pulsar using existing Kafka integrations. Downstream applications can use Pulsar Functions to handle events arriving at the system and implement serverless event stream transmission.

The above is an example of supporting native Kafka protocols on Apache Pulsar. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report