Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the unified message consumption model based on Queue + Stream?

2025-02-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What is the unified message consumption model based on Queue + Stream? many novices are not very clear about it. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

In the previous article, we described why Apache Pulsar can become an enterprise-level flow and messaging system. The enterprise features of Pulsar include persistent storage of messages, multi-tenancy, interconnection of multiple computer rooms, encryption and security, etc. One of the questions we are often asked is what is the difference between Apache Pulsar and Apache Kafka.

When the user chooses a message system, the message model is the user's first consideration. The message model should cover the following three aspects:

Message consumption-how to send and consume messages

Message acknowledgement (ack)-how to confirm a message

Message save-how long the message will be retained, why it will be deleted, and how to delete it.

Message consumption model

In the real-time streaming architecture, message delivery can be divided into two categories: Queue and Stream.

Queue (Queue) model

The queue model mainly consumes messages in a disordered or shared way. Through the queue model, users can create multiple consumers to receive messages from a single pipeline; when a message is sent out of the queue, only one of the multiple consumers (any one is possible) receives and consumes the message. The specific implementation of the message system determines which consumer actually receives the message.

The queue model is often used in conjunction with stateless applications. Stateless applications don't care about sorting, but they do need to be able to ack or delete a single message, and to expand the ability to consume parallelism as much as possible. Typical message systems based on queue model include RabbitMQ and RocketMQ.

Stream model

In contrast, the flow model requires the consumption of messages to be strictly sorted or exclusive. For a pipeline, using the streaming model, there will always be only one consumer using and consuming messages. The consumer receives messages sent from the pipeline in the exact order in which the messages are written to the pipeline.

A flow model is usually associated with a stateful application. Stateful applications pay more attention to the order of messages and their status. The order in which messages are consumed determines the state of the stateful application. The order of messages affects the correctness of the application's processing logic.

In microservice-oriented or event-driven architectures, both the queue model and the flow model are required.

Message consumption Model of Pulsar

Through "subscription", Apache Pulsar abstracts a unified: producer-topic-subscription-consumer consumption model. Pulsar's message model supports both the queue model and the flow model.

In Pulsar's message consumption model, Topic is the channel used to send messages. Each Topic corresponds to a distributed log in Apache BookKeeper. Each message published by the publisher is stored only once in Topic; in the process of storage, BookKeeper stores message replication on multiple storage nodes; each message in Topic can be used multiple times according to the subscription needs of consumers, and each subscription corresponds to a consumer group (Consumer Group).

The topic (Topic) is the real source of consumer information. Although messages are stored only once on a topic (Topic), users can have different subscriptions to consume these messages:

Consumers are grouped together to consume messages, and each consumer group is a subscription.

Each Topic can have a different consumer group.

Each group of consumers is a subscription to the topic.

Each group of consumers can have their own different consumption patterns: Exclusive, Failover or Share.

Through this model, Pulsar combines the queue model and the flow model, and provides a unified API interface. This model neither affects the performance of the messaging system, nor brings additional overhead, but also provides users with more flexibility to facilitate user programs to use the messaging system in the best matching pattern.

Exclusive subscription (Stream flow model)

As the name implies, in an exclusive subscription, there is one and only one consumer in a consumer group (subscription) to consume messages in the Topic at any time. The following figure is an example of an exclusive subscription. In this example, there is an active consumer Amur0 with subscription A, and messages M0 to M4 are sent sequentially and consumed by AME 0. If another consumer Amur1 wants to attach to subscription A, it is not allowed.

Failover (Stream flow model)

With failover subscriptions, multiple consumers (Consumer) can attach to the same subscription. However, of all consumers in a subscription, only one consumer will be selected as the primary consumer of the subscription. Other consumers will be designated as failover consumers.

When the primary consumer is disconnected, the partition will be reassigned to one of the failover consumers, and the newly assigned consumer will become the new primary consumer. When this happens, all unacknowledged (ack) messages are delivered to the new primary consumer. This is similar to Consumer partition rebalance in Apache Kafka.

The following figure is an example of a failover subscription. Consumers BMY 0 and BMUI 1 subscribe to consumer messages by subscribing to B. BMY 0 is the primary consumer and receives all messages. BMY 1 is a failover consumer, and if the consumer BMY 0 fails, it will take over the consumption.

Shared subscription (Queue queue model)

Using a shared subscription, behind the same subscription, the user mounts as many consumers as the application needs. All messages in the subscription are sent to multiple consumers behind the subscription in a circular distribution, and a message is delivered to only one consumer.

When a consumer disconnects, all unacknowledged (ack) messages delivered to it are reassigned and organized to be sent to the remaining consumers on the subscription.

The following figure is an example of a shared subscription. Consumers Cmur1 and Cmur2 and Cmur3 both consume messages on the same topic. Each consumer receives about 1Universe 3 of all messages.

If you want to increase the speed of consumption, users do not need to increase the number of partitions, they just need to add more consumers to the same subscription.

The choice of three subscription modes

Exclusive and failover subscriptions, allowing only one consumer to use and consume each subscription to the topic. Both modes use messages in the order of topic partitions. They are best suited for flow (Stream) use cases that require strict message ordering.

Shared subscriptions allow multiple consumers per topic partition. Each consumer in the same subscription receives only a portion of the messages from the topic partition. Shared subscriptions are best suited to the usage pattern of queues (Queue) that do not need to guarantee message order, and can expand the number of consumers as needed.

Subscriptions in Pulsar are actually similar to the concept of Consumer Group in Apache Kafka. The operation of creating subscriptions is lightweight and highly scalable, and users can create any number of subscriptions according to the needs of the application.

Different subscription types can be used for different subscriptions to the same topic. For example, a user can provide a failover subscription with 3 consumers on the same topic, and a shared subscription with 20 consumers, and can add more consumers to the shared subscription without changing the number of partitions.

The following figure depicts a topic with three subscriptions to AMagol B and C, and illustrates how messages flow from producers to consumers.

In addition to the Unified messaging API, because the Pulsar topic partition is actually stored in Apache BookKeeper, it also provides a read API (Reader), similar to the consumer API (but Reader does not have cursor management), so that users have complete control over how messages in Topic are used.

Message acknowledgement (ACK) for Pulsar

Due to the characteristics of distributed systems, failures may occur when using distributed messaging systems. For example, when consumers consume messages from topics in the messaging system, errors may occur both by consumers who consume messages and by message agents (Broker) that serve the topic partition. The purpose of message acknowledgement (ACK) is to ensure that when such a failure occurs, consumers can resume consumption from where they last stopped, ensuring that they will neither lose messages nor reprocess ACK messages.

In Apache Kafka, the recovery point is often referred to as Offset, and the process of updating the recovery point is called message acknowledgement or submission Offset.

In Apache Pulsar, a special data structure, Cursor, is used in each subscription to track the ACK status of each message in the subscription. Whenever a consumer acknowledges a message on a topic partition, the cursor is updated. Updating cursors ensures that consumers do not receive messages again.

Apache Pulsar provides two message acknowledgement methods, single acknowledgement (Individual Ack) and cumulative acknowledgement (Cumulative Ack). Through cumulative confirmation, the consumer only needs to confirm the last message it has received. All messages in the topic partition, including the provider message ID, will be marked as acknowledged and will not be delivered to the consumer again. Cumulative acknowledgements are similar to Offset updates in Apache Kafka.

Apache Pulsar can support a single acknowledgment of a message, that is, selective acknowledgment. Consumers can confirm a message separately. The confirmed message will not be retransmitted. The following figure illustrates the difference between a single confirmation and a cumulative confirmation (the message in the gray box is acknowledged and will not be redelivered). In the top half of the figure, it shows an example of cumulative acknowledgements where messages before M12 are marked as acked. In the lower half of the figure, it shows an example of acking separately. Only confirm messages M7 and M12-in the case of consumer failure, all messages except M7 and M12 will be retransmitted.

Consumers with exclusive or failover subscriptions can make single acknowledgments and cumulative acknowledgements to messages; consumers who share subscriptions are only allowed to make single acknowledgments to messages. The ability of a single confirmation message provides a better experience for dealing with consumer failures. For some applications, it can take a long time or very expensive to process a message, and it is important to prevent retransmission of acknowledged messages.

This specialized data structure for managing Ack, Cursor, is managed by Broker and provides storage using BookKeeper's Ledger. We will cover more details about cursors in later articles.

Apache Pulsar provides flexible message consumption subscription types and message confirmation methods. Through a simple and unified API, you can support a variety of message and flow usage scenarios.

Message retention for Pulsar (Retention)

After the message is acknowledged, the Broker of the Pulsar updates the corresponding cursor. When a message in the Topic is confirmed by all subscriptions ack, the message can be deleted. Pulsar also allows messages to be retained longer by setting the retention time, even if all subscriptions have confirmed that they have been consumed.

The following figure shows how to keep messages in a topic with 2 subscriptions. Subscription A consumes all messages before M6 and subscription B have consumed all messages before M10. This means that all messages before M6 (in the gray box) can be safely deleted. Subscription A still does not use messages between M6 and M9 and cannot delete them. If the topic is configured with a message retention period, messages M0 to M5 will remain the same for the configured period, even if An and B have confirmed that they have been consumed.

In the message retention policy, Pulsar also supports message time to live (TTL). If the message is not used by any consumer within the configured TTL period, the message is automatically marked as acknowledged. The difference between a message retention period and a message TTL is that the message retention period acts on messages marked as acknowledged and set to be deleted, while TTL acts on messages that are not ack. TTL in Pulsar is illustrated in the above illustration. For example, if subscription B does not have an active consumer, message M10 is automatically marked as acknowledged after the configured TTL period, even if no consumer actually reads the message.

Pulsar VS. Kafka

Through the above aspects, we summarize the differences between Pulsar and Kafka in the message model.

Model concept

Kafka: Producer-topic-consumer group-consumer

Pulsar:Producer-topic-subscription-consumer.

Consumption pattern

Kafka: mainly focused on Stream mode, exclusive consumption for a single partition, no Queue consumption mode

Pulsar: provides a unified messaging model and API. Stream mode-exclusive and failover subscriptions; Queue mode-how subscriptions are shared.

Message acknowledgement (Ack)

Kafka: using offset Offset

Pulsar: use specialized Cursor management. Cumulative confirmation has the same effect as Kafka; single or selective confirmation is provided.

Message retention

Kafka: deletes the message based on the set retention period. It is possible that the message is not consumed and is deleted after it expires. TTL is not supported.

Pulsar: messages are deleted only after they have been consumed by all subscriptions, and no data is lost. It is also allowed to set a retention period to retain the consumed data. TTL is supported.

Comparison and summary:

Apache Pulsar combines high-performance streams (what Apache Kafka pursues) and flexible traditional queues (what RabbitMQ pursues) into a unified messaging model and API. Pulsar uses a unified API to provide users with a system that supports streams and queues with the same high performance.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report