How to compare LogDevice with Apache Pulsar 07/09 Update SLTechnology News&Howtos

How to compare LogDevice with Apache Pulsar

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to make a comparison between LogDevice and Apache Pulsar, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Facebook has released open source LogDevice. Given the similarity between LogDevice target use cases, it is natural to ask if there are any similarities between LogDevice and Apache Pulsar. This article will answer this question. It is not easy to compare LogDevice with Pulsar. LogDevice has a lower level of operation than Pulsar. LogDevice is more similar to Twitter's DistributedLog. Both focus only on logging primitives rather than advanced features such as schema management, multi-tenancy, cursor management, and so on. These advanced functions will be left to users to implement based on LogDevice. The following will discuss the basic elements found in both LogDevice and Pulsar: distributed logging.

Architecture

LogDevice displays a log primitive to the user. The write client writes the entry to the sequencer node. The node assigns the log sequence number (LSN) to all entry and then writes the entry to a subset (replica set) of the larger node set that has been assigned to the log. The sequencer of LogDevice is similar to broker in Pulsar. In Pulsar, the broker allocates the message ID and sends the message to Apache BookKeeper for storage.

LogDevice and Pulsar have a lot in common in architecture, such as separating computing from storage. Compared with the monolithic architecture, this architecture has the following advantages:

Single log can grow indefinitely in the event of node failure, it can be seamlessly recovered, cluster expansion, simple read and write, independent scalability.

Compare Pulsar and Kafka: how a shard-based architecture improves overall performance, scalability, and resiliency [2] details the advantages of this architecture. Both Pulsar and LogDevice have these advantages.

LogDevice and Pulsar read data differently. In Pulsar, the read client subscribes to the topic on the broker and receives messages from the broker, while in LogDevice, the read client connects directly to the storage node.

Reading data directly from the storage node, such as LogDevice, allows read operations to have a greater degree of fan out. That is, since readers do not need to access the same node, the system can support more readers on a single topic.

However, when you need to ensure the consistency of the logs, reading data directly from the storage node will increase the latency. If the writer does not acknowledge the written entry, the reader cannot read the entry. When reading data from the storage node, you need to notify the storage node in some way that the entry has been copied to enough nodes and send the ack to the writer, until the entry is not readable.

In Pulsar, the client reads and writes through broker. This way of reading and writing has both latency and performance advantages. Because the entry ack is read and written on the same node, the entry is immediately readable when the entry is sent to the writer. By controlling read and write on broker, Pulsar can support more complex subscription models, such as shared subscription, Failover subscription [3] and so on.

Consistency, multiple copies, Failover

LogDevice and Pulsar use similar techniques to implement the Global Sequential broadcast Protocol (TOAB) [4]. The log is divided into different epoch, each node (leader) can determine the sequence number of the entry in that epoch, and the corresponding mechanism ensures that the previous epoch will not be written.

Both LogDevice and Pulsar use ZooKeeper to determine leader.

In LogDevice, leader is also called sequencer. Each log has a sequencer, and each sequencer is assigned a "epoch" number (from the ZookKeeper). LSN consists of epoch and a locally monotonous incremental component. Sequencer determines the LSN of each entry and forwards the entry in the epoch to a set of storage nodes. When there are enough ack entry storage nodes, sequencer sends the ack to the client that initiated the write request. In the event of a sequencer failure, a new sequencer gets the new epoch and can immediately serve the write operation. Starting an operation that "closes" the previous epoch in the background disables reading from the new epoch until the previous epoch is blocked. The "close" operation involves notifying enough storage nodes of the existence of the new epoch from the node set, so that there will not be enough ack to write to the client ack.

For Pulsar, epoch is BookKeeper ledger. Each topic has a list of BookKeeper ledger, which make up the entire log of the topic. When one Pulsar broker crashes, another broker takes over the topic, thus ensuring that the ledger in the previous broker is closed, creating its own ledger, and adding it to topic's ledger list. The last three operations involve ZooKeeper. Once the ledger list of topic is updated, broker can begin to provide support for reading and writing on topic. All data written to topic is persisted to the BookKeeper ledger and stored in a set of storage nodes before it is ack and visible to the reader.

For LogDevice and Pulsar (with BookKeeper), entry persistence requires only one entry to hit a subset of nodes, so writes with low latency can be maintained when there are slow or failed storage nodes.

When a fault is detected in the leader, the LogDevice can provide write services as soon as the fault is found, only two round trips to the ZooKeeper to select a sequencer. In Pulsar, you need to restore the previous ledger before writing again, including talking to some storage nodes, making new writes to ZooKeeper, and so on. In addition, in Pulsar, reading and writing can be resumed at the same time, while in LogDevice, a "close" operation must be performed before reading, similar to the restore operation of ledger.

We speculate that LogDevice does not allow writes before the previous epoch is "closed" because its reads are not coordinated and have nothing to do with performance. Detection of a sequencer failure takes up recovery time, regardless of whether the previous epoch is closed or not. Allowing closed operations before writing requires sequencer to coordinate with the reader, which has little effect, but adds complexity. In Pulsar, because the read is done on the broker, it is easy to restore the previous ledger before writing.

Storage

The LogDevice storage node stores the entry in RocksDB. Entry is stored in a chronological collection of column families, and entry is keyed by the LSN combination of log ID and entry. Simply put, each storage node has many RocksDB instances in chronological order, writing only to the latest RocksDB instances. These RocksDB instances try to ensure that only a small amount of compression is done to avoid write magnification.

There is a log, an entry log, and an index on the Pulsar storage node (BookKeeper). The log has a dedicated disk. When you write entry to bookie, you are actually writing to the log disk and ack to the writer. Then, put the entry into a staging area, and when there is enough entry in this area, it is stored through ledger ID and entry ID, flush to entry logs. At this point, every entry in the entry log has been written to the index, which is an instance of RocksDB.

Both the LogDevice and Pulsar storage tiers can achieve low latency writes when there are many concurrent active logs. By crossing the entry of multiple logs into several files, random writes can be minimized. This has a greater impact on rotating disks, where writing multiple files means that the head must be physically moved multiple times, but even on solid-state disks, prioritized writes have more performance advantages than random writes.

However, staggered writes also mean more reads.

Reads in logging systems are usually divided into two categories, rear-end reading and catch-up reading. For rear-end reads, both LogDevice and Pulsar are unlikely to hit the disk because the required data should still be stored in the in-memory cache to some extent; catch-up reads will eventually hit the disk. Throughput is usually more important than catch-up read latency, and both LogDevice and Pulsar are designed to do the same.

Although most reads should be contiguous, LogDevice needs to read many SST files for catch-up reads. Because RocksDB will press the key to sort before sending the entry flush to disk. In this way, it is not clear whether to read and write on the same disk. If so, catch-up reads may affect the write performance of the system.

RocksDB allows you to configure multiple paths to store the old SST file separately from the new SST file.

Because Pulsar saves the critical path of the write on a separate disk, the read operation is completely independent. Reads are usually ordered, too, because the data in the entry log is sorted by ledger and entry ID before flush to disk.

LogDevice avoids compression as much as possible, so writes are magnified. This makes sense for the logging system because most of the data written does not need to be read, but it affects data retention. Individual logs cannot be deleted, so the retention time of all logs in the system must be determined by retention time. None of the logs in the cluster can be saved permanently, and some can only be saved for a few hours.

In Pulsar, to delete the log from the storage node, you need to delete it from the index first. As a result, the index is often compressed, but since the entry data itself is not in the index, this does not affect much. The percentage of each entry log referenced by the storage node listening index. Once an entry log falls below a certain threshold, the active data is copied to the new "compressed" entry log, the index is updated, and the original entry log is deleted.

LogDevice is an interesting complement to the distributed log space. Pulsar is not only a distributed logging system, but also a complete messaging platform, so LogDevice cannot be compared directly to Pulsar, but it is nice to see that the LogDevice team has decided to adopt an architecture similar to Pulsar. Now that LogDevice is open source, I'm looking forward to using it.

After reading the above, have you mastered the method of comparing LogDevice with Apache Pulsar? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.