Why choose Apache BookKeeper? 07/16 Update SLTechnology News&Howtos

Why choose Apache BookKeeper?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you why to choose Apache BookKeeper, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.

I will introduce a range of competitive features of Apache BookKeeper, including Icano isolation, data distribution, scalability and operability.

Ipaw O isolation

Predictable low latency is important for real-time applications, especially critical online services (for example, core business services, databases, etc.). In the case of messaging systems, in most messaging systems, a slower consumer may lead to a backlog of messages, which may further lead to overall performance degradation.

The problem is that the slower consumer forces the storage system to read data from the persistent storage media, which can lead to jitter in and out of the page cache. This occurs when the storage Iripple O component shares a single path for write, rear-end read, and catch-up read.

In BookKeeper, bookie (a single BookKeeper storage node) is designed to use three separate Imax O paths for write, rear-end read, and catch-up read, respectively. It is important to separate these three paths because writes and rear-end reads require higher predictable low latency, while catch-up reads require higher throughput. Providing physical isolation between these workloads means that BookKeeper can take full advantage of the following:

Network ingress bandwidth and sequential write bandwidth when writing

Network egress bandwidth and IPOS (input / output operations per second) on multiple ledger disks when reading

ICompo isolation means that BookKeeper can provide these advantages without prejudice to other advantages.

Data distribution

Services built on BookKeeper (such as Apache Pulsar) store log streams on BookKeeper as shredded ledgers. These ledgers are copied to multiple bookie. In this way, there are as many choices as possible for data storage, so as to achieve high availability, traffic load balancing, simple operation and maintenance, and so on. I will introduce some advantages from the perspective of deployment and operation and maintenance.

First, the storage capacity of a single log stream is never limited by the storage capacity of a single host. Data can be stored as long as the entire cluster has sufficient capacity.

Second, log flow rebalancing is not involved when expanding the BookKeeper cluster. Administrators can expand the BookKeeper cluster by adding new devices. The cluster can discover new bookie and write shards to it. BookKeeper also provides a variety of distribution strategies, including rack awareness, area awareness, weight-based layout strategies, and so on, to achieve as many layouts as possible.

Third, BookKeeper can repair copies faster and more efficiently in the event of machine failure. When a shard is lost due to a machine failure or damaged due to a disk failure, BookKeeper can determine which shards need to be repaired (replicate the entry to meet the replica requirements) and repair it from multiple hosts at the same time.

Compared with partition-centric systems such as Apache Kafka, BookKeeper has the advantage of scalable performance. In Apache Kafka, log flows (also known as Kafka partitions) are stored sequentially on only some machines, and expanding the Kafka cluster requires a large amount of data rebalancing, which itself is resource-consuming, error-prone, and complex in operation and maintenance.

In addition, on partition-centric systems, a damaged single disk requires the system to copy the entire log stream to the new disk to meet the multi-copy requirement.

All log fragments are copied across N possible bookie to a configurable number of bookie (the number of copies in the figure is 3). Log fragments are evenly distributed to achieve horizontal expansion without rebalancing.

Scalable

As a real-time log stream storage platform, it is very important to be able to expand with the increase of traffic or more data written to the system. Apache BookKeeper implements its scalability based on the following:

? Number of Ledger / stream

Stream scalability can support the storage of a large number of log streams, and other performance is not affected when the number of ledger or streams increases from hundreds to millions. The key to stream scalability is the storage format.

If both the ledger and the stream are stored in dedicated files, the implementation of stream scalability can be problematic because when these files are periodically flushed from the page cache to disk, they are scattered across the disk.

BookKeeper stores data from ledger and streams in an interleaved storage format, integrates entry from different ledger and streams, stores them in large files, and then indexes them. This reduces both the number of files and the I ledger O contention, allowing BookKeeper to extend for a large number of files and streams.

? Number of Bookie

Bookie scalability, that is, log flow storage supports rapidly increasing traffic by adding bookie (storage nodes in BookKeeper). In BookKeeper, there is no direct interaction between bookie. This allows BookKeeper to expand the cluster simply by adding new machines.

Similarly, because of the way BookKeeper distributes data on bookie, partitioning data is not expensive and does not deplete system network and Imando bandwidth when expanding BookKeeper clusters. No matter how the data is allocated, this can increase the size of the cluster.

Yahoo! and Twitter both use BookKeeper, with hundreds of bookie on a single cluster.

? Number of client

Client scalability, that is, the ability of log flow storage to support a large number of concurrent clients and support a large number of fan out. BookKeeper can do this in a number of places:

Both the client and the server fully use Netty to implement the asynchronous network Imax O. All networks are multiplexed using a single TCP connection and are asynchronous. Very efficient pipelines and high throughput are achieved with little resource consumption.

Copy data to multiple bookie. The data is identical between bookie replicas. In a system such as Apache Kafka, clients can only read data from leader nodes. The client of BookKeeper can read data from any copy of bookie (this read is repeatable). This not only achieves high read availability, but also distributes read traffic evenly.

Because the client can repeatedly read data from any bookie copy, the application can configure more copies to achieve a higher read fan out.

? Single stream throughput

Applications can improve throughput by using more streams or increasing bookie. In addition, BookKeeper can adjust single-stream throughput by increasing the ensemble size (ensemble is a subset of bookie used to store a given ledger or stream) and segmenting data between bookie.

This is critical for stateful applications that need to sort data on a single stream.

Simple operation and maintenance

Apache BookKeeper is designed for simple operation and maintenance. When the system is running, you can easily expand the capacity by adding bookie nodes. If a bookie node is no longer automatically available, all entry contained in this bookie will be marked as replicated, and the BookKeeper automatic recovery daemon automatically replicates data from other available replicas to the new bookie node.

When running the bookie node, BookKeeper provides read-only mode. In some cases, such as when the disk is full or corrupted, bookie automatically changes to read-only mode. In read-only mode, bookie does not allow data to be written, but traffic can still be read. This kind of self-healing reduces many pain point problems in operation and maintenance.

In addition, BookKeeper provides a variety of ways to manage clusters, including using management CLI tools, Java management libraries, HTTP REST API, and so on. REST API has the flexibility to write plug-in tools or to use certain operations in existing tools.

Security.

Apache BookKeeper supports a pluggable authentication mechanism (http://bookkeeper.apache.org/docs/latest/security/overview/)) that applications can use for their own authentication. BookKeeper can also be configured to support multiple authentication mechanisms. The purpose of the authenticator is to establish the identity of the client and assign an identifier to the client.

The identifier can be used to determine the actions that the client is authorized to perform. By default, BookKeeper supports two types of authentication programs: TLS and SASL.

The above is why you chose Apache BookKeeper. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.