What are the solutions for the three major cross-region replication of Apache Pulsar 07/19 Update SLTechnology News&Howtos

What are the solutions for the three major cross-region replication of Apache Pulsar

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shows you what Apache Pulsar's three major cross-geographical replication solutions are, concise and easy to understand, absolutely can make you shine, through the detailed introduction of this article I hope you can gain something.

Apache Pulsar is a multi-tenant, high-performance inter-service messaging solution that supports multi-tenant, low-latency, read-write separation, cross-region replication, rapid scaling, and flexible fault tolerance. It natively supports solutions for cross-continent level cross-region replication, and combines its own tenant and namespace level abstractions to flexibly support solutions for cross-region replication in different scenarios.

meaning of demand

Supported by Geo-Replication design, firstly, we can easily distribute services to multiple computer rooms; secondly, we can deal with computer-room-level failures, that is, when one computer room is unavailable, services can be transferred to other computer rooms to continue to provide services.

abstract

Apache Pulsar has built-in multi-cluster cross-region replication function. GEO-Repliaaction refers to the configuration of clusters scattered in different physical regions so that they can replicate data between clusters.

According to whether messages are asynchronous read-write dimensions, cross-region replication can be divided into the following two schemes:

Synchronous mode: If the requirements for data disaster recovery level are very high, synchronous cross-city deployment mode can be adopted. Data copies will exist between different cities. However, the fluctuation of the network between cities will have a great impact on performance, because it needs to wait for multiple cities to be successfully written before returning to the client.

Asynchronous mode: If the disaster recovery level of data is not so high, asynchronous cross-city deployment mode can be adopted. For example, there are two independent data centers, Shanghai and Toronto. Messages written to Shanghai will be written to Toronto asynchronously. The advantages do not affect the performance of the main process, but are less than one additional storage overhead.

Below we discuss pulsar's cross-region replication scheme in asynchronous mode.

Pulsar currently supports three asynchronous cross-geographies replication scenarios:

fully connected

one-way replication

Failure mode

From the perspective of whether there is configurationStoreServers (global zookeeper), it can be divided into the following two asynchronous cross-geographical replication schemes:

1. There are configurationStoreServers fully connected

2. No configurationStoreServers

one-way replication

Failure mode

A core concept in the whole cross-region replication is whether the data between clusters can be interconnected. The interaction between them mainly depends on the following configuration information:

cluster （cluster name）zookeeper （local cluster zk servers）configuration-store （global zk servers）web-service-urlweb-service-url-tlsbroker-service-urlbroker-service-url-tls

When initializing the pulsar cluster, the user can specify the corresponding information mentioned above. Examples are as follows:

bin/pulsar initialize-cluster-metadata \ --cluster pulsar-cluster-1 \ --zookeeper zk1.us-west.example.com:2181 \ --configuration-store zk1.us-west.example.com:2181 \ --web-service-url http://pulsar.us-west.example.com:8080 \ --web-service-url-tls https://pulsar.us-west.example.com:8443 \ --broker-service-url pulsar://pulsar.us-west.example.com:6650 \ --broker-service-url-tls pulsar+ssl://pulsar.us-west.example.com:6651

Full-mesh(fully connected)

Full-mesh forms allow data to be shared across multiple clusters, as shown below:

an analysis of the concept

configurationStoreServers: stores the configuration information of each cluster, that is, the address information that enables clusters to sense each other. In addition, tenant and namespace information will be stored. The main purpose is to simplify the operation process. When the information of one cluster is updated, other clusters can obtain the information change through global zookeeper. tenant: which clusters are allowed to operate by the currently created tenant (-allowed-clusters)

namespace: Which clusters are allowed to replicate data between the currently created namespace (-clusters)

principle

For data replication between multiple clusters, we can simplify it to data replication between two clusters. Based on this idea, Geo-Replication works as shown in the following figure:

Currently, there are two clusters deployed in Beijing and Shanghai respectively. When a user uses producer to send data in the Beijing cluster, it will first be sent to the local cluster (topic1) of the Beijing computer room. At the same time, a replication cursor will be created, which is used to replicate data. Through this cursor information, you can determine which stage the current data is replicated to. At the same time, it will create a replication producer, which will read the data from topic1 in the Beijing computer room, and then write the data to topic1 in the Shanghai computer room. After receiving the producer's request, the broker in the Shanghai computer room will write the data to the same topic locally (topic1). At this time, if the user of Shanghai computer room opens consumer to consume data, he will receive data information produced by Beijing computer room producer. And vice versa.

The following questions need to be clarified here:

In the fully connected scenario, the data of Beijing computer room will be copied to the cluster of Shanghai computer room, and the data of Shanghai computer room will also be copied to Beijing computer room. Will the data of Beijing computer room be copied to Shanghai computer room, and then Shanghai computer room will copy the data back to Beijing in reverse, forming an endless loop of data? Because when the producer sends a message, it knows which cluster it belongs to. When the produced message is replicated by the replication producer, a label: replication_from will be marked in the message, indicating where the message comes from. This can solve the reverse replication problem. In Geo-Replication scenarios, the exact semantics of messages can also be guaranteed (at-least-once + producer-name + sequence ID).

The delay of replication depends on the network delay between the two rooms. If the delay is relatively large, the network situation between the two rooms needs to be considered.

Once global zookeeper is configured, data replication between clusters is bidirectional, and data between clusters mounted under global zookeeper is interoperable.

one-way replication

As mentioned above, in the case of configuring global zookeeper, there is no way to do unidirectional replication of data, but in many scenarios, we do not need all the data between clusters to be fully connected. In this scenario, we can consider using unidirectional replication. It should be emphasized that unidirectional replication does not require users to configure or specify configurationStoreServers separately. When configuring, you only need to configure the value of configurationStoreServers to the zookeeper address of the local cluster (zookeeper Servers).

So how to do cross-cluster replication scenarios without configuring global zookeeper?

As mentioned above, global zookeeper is mainly used to store address information and corresponding namespace information of multiple clusters, and there is no additional metadata information. So in a one-way replication scenario, you need to tell the clusters in the other rooms that you need to read namespace information between different clusters.

Failure mode

Failover mode is a special case of one-way replication.

In Failover mode, the cluster in the remote server room is only used for data backup, and there will be no producer and consumer. Only after the active cluster is down will the corresponding producer and consumer be switched to the corresponding standby cluster to continue consumption. Because replication sub exists, the subscription state will also be copied to the backup room.

Current Pulsar Geo-Replication

the problems

Pulsar can only guarantee the message order of single room production. In multi-room scenarios, there is no way to ensure the global order of messages in multiple rooms. Since cursor snapshot is performed regularly, the accuracy in time is not too high, and there are some deviations. At present, only the position of "Mark delete position" will be synchronized. Messages signed separately cannot be synchronized temporarily. Cursor snapshots are only possible if all related clusters are available.

When cursor snapshot is used, some cache will be generated, which will affect the subsequent calculation results related to backlog.

What are the three solutions for Apache Pulsar's cross-geographies replication, and have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserves, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.