Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the message retention and expiration policy of Pulsar

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to analyze the message retention and expiration strategy of Pulsar. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Apache Pulsar, a top-level project of the Apache Software Foundation, is a next-generation cloud native distributed message flow platform that integrates message, storage and lightweight functional computing. It adopts a separate computing and storage architecture, supports multi-tenant, persistent storage, and multi-room cross-regional data replication, and has streaming data storage features such as strong consistency, high throughput, low latency and high scalability.

By default, Pulsar Broker does the following for messages:

When the message is confirmed by Consumer, the delete operation is performed immediately.

Messages that are not acknowledged are stored in backlog.

However, in many online production environments, this default behavior does not meet our production needs, so Pulsar provides the following configuration policies to override these behaviors:

Retention policy: users can retain messages that have been acknowledged by Consumer.

TTL policy: for unacknowledged messages, users can set TTL to make unacknowledged messages reach the acknowledged state.

The above two policies are set at the NameSpace level.

Retention strategy

Setting the Retention policy provides two ways:

The size of the message. Default: defaultRetentionSizeInMB=0

The time the message was saved. Default: defaultRetentionTimeInMinutes=0.

We can configure these two items in broker.conf or through the command line. As mentioned above, both policies are set at the NameSpace level, so when we use the command line configuration, we use NameSpaces to configure it, as follows:

Root@e6df71e544ea:/pulsar#. / bin/pulsar-admin namespaces set-retentionThe following options are required:-- size,-s-- time,-t

Set the retention policy for a namespaceUsage: set-retention [options] tenant/namespace Options: *-- size,-s Retention size limit (eg: 10m, 16G, 3T). 0 or less than 1MB means no retention and-1 means infinite size retention *-- time,-t Retention time in minutes (or minutes, hours, days, weeks eg: 100m, 3h, 2d, 5w). 0 means no retention and-1 means infinite time retention

As shown above: we can specify the size or time we need to configure with-s or-t.

After you have set the Retention policy, you can view the specific information by using the following command:

$pulsar-admin namespaces get-retention [your tenant] / [your-namespace] {"retentionTimeInMinutes": 10, "retentionSizeInMB": 0}

Backlog

A backlog is a collection of unacknowledged messages, and it has a major premise that the Topic in which these messages are located is persisted by Broker, and by default, user-created Topic is persisted. In other words, Pulsar Broker stores all unacknowledged or unprocessed messages in backlog.

Similarly, we can configure the size of the backlog at the NameSpace level. It is important to note that when configuring backlog, we need to be clear about the following two points:

Under the current NameSpace, what is the allowed size of each Topic?

What actions will be performed if the set threshold of backlog is exceeded

When the set threshold of backlog is exceeded, Pulsar provides the following three policies for users to choose from:

You can configure backlog at the NameSpace level through set-backlog-quota, as shown below:

Root@e6df71e544ea:/pulsar#. / bin/pulsar-admin namespaces set-backlog-quotaThe following options are required:-l,-- limit-p,-- policy

Set a backlog quota policy for a namespaceUsage: set-backlog-quota [options] tenant/namespace Options: *-l,-- limit Size limit (eg: 10m, 16G) *-p,-- policy Retention policy to enforce when the limit is reached. Valid options are: [producer_request_hold, producer_exception, consumer_backlog_eviction]

As shown above, set-backlog-quota provides two parameters,-l to specify the size of the backlog you set, and-p to specify the policy that Broker will execute when the threshold of backlog you set is exceeded.

After you have set up backlog, you can view the corresponding information by using the following command:

$pulsar-admin namespaces get-backlog-quotas [your tenant] / [your namespace] {"destination_storage": {"limit": 2147483648, "policy": "producer_request_hold"}}

If you want to unconfigure backlog, you can use the following command:

$pulsar-admin namespaces remove-backlog-quota [your tenant] / [your namespace]

When there is a backlog of messages, you can clear the backlog of messages through clear-backlog. Clearing the backlog of messages in backlog is a relatively dangerous operation, so you will be prompted to confirm whether you want to delete messages in backlog. Clear-backlog provides a parameter of-f (--force) to block the prompt.

$pulsar-admin namespaces clear-backlog [your tenant] / [your namespace]

Time To Live (TTL)

By default, Pulsar persists all unacknowledged messages. If there are many unacknowledged messages, this strategy can cause a large backlog of messages, resulting in an increase in disk space. In some scenarios, messages do not need to be persisted, and users prefer to discard these unacknowledged messages directly. In this case, you can set the TTL to make the unacknowledged message enter the acknowledged state, and when the set TTL time is exceeded, the message will be discarded with the corresponding Retention policy.

A typical use scenario of TTL is that when Consumer fails for some reason and cannot consume messages normally, Producer is still producing messages in Topic, resulting in a large number of unacknowledged messages in Topic. In this case, you can change these unacknowledged messages into confirmed status by setting TTL.

Similarly, you can set TTL at the Namesapce level by specifying set-message-ttl, as follows:

Root@e6df71e544ea:/pulsar#. / bin/pulsar-admin namespaces set-message-ttlThe following option is required:-- messageTTL,-ttl

Set Message TTL for a namespaceUsage: set-message-ttl [options] tenant/namespace Options: *-- messageTTL,-ttl Message TTL in seconds Default: 0

As shown above, set-message-ttl has only one parameter-ttl, in seconds, with a default value of 0.

After you have set the TTL policy, you can view the corresponding configuration information through get-message-ttl, as shown below:

$pulsar-admin namespaces get-message-ttl [your tenant] / [your namespace] 60

The difference and relation between TTL, Backlog and Retention

In the above description, it can be found that TTL only deals with one thing, changing the unacknowledged message into the confirmed state, and TTL itself does not involve the corresponding deletion operation, as shown in the following figure:

In T1 phase, the five messages of m1-m5 are confirmed, and the five messages of m6-m10 are not confirmed in T2 phase. Set the TTL policy for M6, M7 and M8 messages.

In the T3 phase, the threshold set by TTL is reached, and the three messages M6, M7 and M8 are confirmed.

As you can see from the figure above, for unacknowledged messages in backlog, when you set TTL, the status of unacknowledged messages will be changed to the confirmed state. The role of TTL here is to move the Cursor of the message from M5 to M8, M6, M7, and M8 into the acknowledged state.

Pulsar is a multiple-subscription messaging system. For a message in Topic, it can be deleted only if all subscribers ack or consume the message.

By default, Pulsar Broker persists all unacknowledged messages to backlog. The function of TTL is that you can change these unacknowledged messages into an acknowledged state, while Retention is concerned about what retention policy you can make for acknowledged messages when they are in an acknowledged state. In other words, backlog is what Broker does for unacknowledged messages. Retention is what the retention policy of Broker is for acknowledged messages.

TTL and Dead letter queue

The introduction of the dead letter queue will not be repeated here.

In the production environment, sometimes poor quality data may be caused by the upstream, which must be solved by the upstream. It is no longer meaningful to continue to try to process other messages, and the user wants to stop processing immediately when an error occurs. A special Topic-- dead letter queue is provided in Pulsar.

Both Dead letter queue and TTL can change an unacknowledged message into an acknowledged state. The main difference between them is that in the T2 phase in the figure above, TTL only changes the unacknowledged messages into the acknowledged state, while the dead-letter queue discards the messages into the dead-letter queue, and the three messages M6, M7, and M8 become acknowledged. The two valid messages M9 and M10 will be processed normally, and Broker will continue to run. After that, you can check for invalid messages from the dead letter queue and ignore or repair and reprocess them as needed. According to their own needs, users can determine whether unacknowledged messages are changed into acknowledgments in the form of TTL or through dead-letter queues. The main criterion is whether you need to deal with messages that cannot be consumed.

Usage problem

Scenario 1:

Start Producer to send messages to Broker, set TTL, do not start Consumer, and set Retention policy to half an hour. After reaching the threshold of Retention, it is found that the message that sets TTL has not been removed. Why?

In the above scenario, there is a problem to note that Consumer is not started. As we mentioned above, TTL moves the message setting Cousor forward. If you do not start Consumer, it means that Cousor has not been initialized, that is, if there is no Consumer, you do not need to set TTL.

Scenario 2:

I set the Retention policy, but reached the threshold of Retention, and the data in Topic was not deleted. Why?

This is an implementation mechanism within Pulsar. In Pulsar, Topic is a logical concept, and a Topic corresponds to a manage ledger. When you write data, you actually write the data to ledger. I still remember a core of Pulsar design mentioned in many previous articles: in Pulsar, all operations are asynchronous, so whether to delete the data in the corresponding ledger when the Retention reaches the specified threshold. This operation is also asynchronous. The operation of delete is not performed on the ledger of the current active. Only when the data is full of the current ledger and the ledger is switched, will the retention policy be actually implemented.

If you want to enforce it, you can use pulsar-admin to force the current ledger to uninstall, forcing it to switch over ledger.

This is the end of the message retention and expiration strategy on how to analyze Pulsar. I hope the above content can be of some help and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report