Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the access mode and tiered storage of Apache Pulsar

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how to analyze the access mode and tiered storage of Apache Pulsar. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Apache Pulsar, a top-level project of the Apache Software Foundation, is a next-generation cloud native distributed message flow platform, which integrates message, storage and lightweight functional computing. It adopts a separate architecture of computing and storage, and supports multi-tenant, persistent storage, multi-room cross-region data replication, and high scalable stream data storage features with strong consistency, high throughput and low latency.

Why choose Apache BookKeeper-Part 1 talks about how Apache Pulsar takes advantage of the way BookKeeper multiple copies work, as well as the different iMaple O modes in BookKeeper.

Below, we will discuss how multiple replicas interact with different Imax O modes in Pulsar, and how Pulsar can implement tiered storage through this interaction.

In essence, Pulsar uses a hierarchical architecture, and this hierarchical architecture allows each Imax O mode to work independently, so that reads and writes never interfere with each other.

Tiering also simplifies the operation of adding a storage tier in a manner that is fully integrated with Pulsar, reducing the cost of increasing the storage tier and improving the scalability of the new storage tier without any impact on developers using Pulsar.

Pulsar is a messaging system that provides publish-subscribe and queuing semantics. The client can be producer or consumer, or a combination of both.

The production client sends a message to the broker and the consumer client consumes the message from the broker. Pulsar stores message collation in topic and assigns topic to broker.

Within a topic, Pulsar guarantees full-order atomic broadcasting (Total Order Atomic Broadcast), that is, once a Pulsar broker publishes a message in producer ack topic, the message will never be lost, copied, or reordered relative to other messages in the same topic. Also, the message order is exactly the same, and all consumer reads messages in exactly the same order.

Pulsar uses BookKeeper as a backup store for topic's backlog of messages. Pulsar broker acts as the stateless service layer at the top of BookKeeper storage.

When producer sends a message to Pulsar, Pulsar immediately writes the message to BookKeeper. Once the BookKeeper ack write operation, broker can publish to the producer ack message, and consumer can read the message.

There are usually three Iamp O modes in the messaging system.

Write: publish messages to the messaging system

Rear-end read: send a message to active subscribers immediately after writing

Catch-up read: when a new consumer wants to start reading from some point before the latest message, or when the existing consumer returns after a long offline time, consumer reads a large number of messages from the log suffix to catch up. Unlike most other messaging systems, each Imax O mode in Pulsar is isolated from each other.

The most interesting pattern is the write mode, which is followed by all other patterns. When Pulsar broker wants to persist a message for a topic, broker writes the message to a set of BookKeeper nodes, which are defined as the quorum written to the topic log.

Each BookKeeper node that receives the message adds the message to the node's log file, and the node's log file is stored on a dedicated disk. When there are enough node ack writes to meet the multiple copies (that is, ack quorum) requirements of the log, the write operation is considered to have been committed and to the producer ack.

From this point on, the message is immutable, and the message will always occupy the log offset. No other message can occupy this offset, and this message can no longer be changed.

The immutability of messages can be used to effectively serve other Icano patterns of the messaging system. It is standard to write a message to the BookKeeper node log, and if the user stops here, the message can still be accessed.

However, this is inefficient because each read requires scanning all logs for the desired messages, and you cannot truncate the logs to free disk space. However, the immutability of a submitted message allows the message to be cached in multiple locations for efficient read operations.

The first level of cache is Pulsar broker, which can be used for rear-end reads. After submitting the message, you can send the message directly to all subscribers associated with this topic without having to use disk.

The next level of cache is the ledger storage disk on the BookKeeper node. When the message is written to the log on the BookKeeper node, it is also written to the memory buffer of the ledger storage disk of the periodic flush.

The BookKeeper node uses this disk to provide read operations. In Pulsar, it is rare to read messages from memory buffers. Rear-end consumer usually reads messages directly from Pulsar's cache. Catching up with consumer usually requests messages from a long time ago, so these messages are generally not stored in memory buffers.

Ledger storage disks serve for catch-up reads. The format used by Ledger storage disks to store messages not only ensures that they are read as sequentially as possible on the same topic, but also optimizes the ability to store multiple different topic on the same disk. Because the ledger storage disk and the log disk are isolated from each other, read operations do not affect the performance of sequential writes on the log disk.

If tiered storage is configured for Pulsar, the last level of cache is long-term storage. Tiered storage allows users to use more cost-effective storage for older parts of the topic backlog.

Hierarchical storage takes advantage of the immutability of messages, but the granularity is larger because it is a waste of space to store each message separately in long-term storage. The Pulsar topic log consists of fragments, each of which corresponds to a sequence of 50000 messages by default. There is only one active shard, and the shard before the active shard will be closed.

When sharding is off, you cannot continue to add new messages. Assuming that a single message in a fragment is immutable and the offset of a single message is immutable, the fragment is immutable. So you can copy immutable objects to any location you want.

To use tiered storage in Pulsar, users must use time-based or size-based policies to configure topic namespaces to uninstall shards. When the topic in the namespace reaches the threshold defined in the policy, Pulsar broker replicates the oldest shard in the topic log to long-term storage until the topic falls below the policy threshold.

After a period of time, Pulsar removes the original shard from the BookKeeper to free up disk space.

Pulsar supports Amazon S3 and S3 compatible object storage for long-term storage, as well as Azure storage, and Google Cloud Storage from Pulsar 2.2.0.

The above content is how to analyze the access mode and tiered storage of Apache Pulsar. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report