How does the cluster snapshot practice in the distributed map database Nebula Graph 07/02 Update SLTechnology News&Howtos

How does the cluster snapshot practice in the distributed map database Nebula Graph

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I would like to talk to you about how the practice of cluster snapshots in the distributed map database Nebula Graph is carried out. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

1.1 demand background

The graph database Nebula Graph will have a large amount of data and high-frequency business processing in the production environment. In the actual operation, human, hardware or business processing errors will inevitably occur, and some serious errors will lead to the normal operation of the cluster or the failure of the data in the cluster. When the cluster cannot be started or the data fails, rebuilding the cluster and pouring the data back into the cluster will be a tedious and time-consuming project. To solve this problem, Nebula Graph provides the ability to create a cluster snapshot.

The Snapshot function needs to provide the ability to create the cluster snapshot at a certain point in time in order to easily restore the cluster to an available state with historical snapshot in case of catastrophic problems.

1.2 terminology

The following terms are mainly used in this article:

The smallest physical storage unit of StorageEngine:Nebula Graph, which currently supports RocksDB and HBase, is only for RocksDB in this article.

The smallest logical storage unit of a Partition:Nebula Graph, a StorageEngine can contain multiple Partition. Partition is divided into the roles of leader and follower. Raftex ensures the data consistency between leader and follower.

GraphSpace: each GraphSpace is a separate business Graph unit, and each GraphSpace has its own collection of tag and edge. A Nebula Graph cluster can contain multiple GraphShpace.

Checkpoint: checkpoint can be used as a backup for full backups for snapshots at a point in time of StorageEngine. Checkpoint files is a hard connection to sst files.

Snapshot: snapshot in this article refers to a snapshot of a point in time in a Nebula Graph cluster, that is, a collection of checkpoint for all StorageEngine in the cluster. With snapshot, you can restore the cluster to the state it was in when a snapshot was created.

Wal:Write-Ahead Logging, use raftex to ensure the consistency of leader and follower.

2 system architecture 2.1 system architecture 2.2 storage system architecture relationship 2.3 storage system physical file structure [bright2star@hp-server storage] $tree. └── nebula └── 1 ├── checkpoints ├── SNAPSHOT_2019_12_04_10_54_42 │ │ ├── data │ ├── 000006.sst │ ├── 000008.sst │ ├── CURRENT │ ├── MANIFEST-000007 │ └── OPTIONS-000005 │ │ └── wal │ │ ├── 1 │ └── 0000000000000000233.wal │ │ ├── 2 │ └── 0000000000000000233.wal │ │ ├── 3 │ └── 0000000000000000233.wal │ │ ├── 4 │ 0000000000000000233.wal │ │ ├── 5 │ └── 0000000000000000233.wal │ │ ├── 6 │ └── 0000000000000000233.wal │ │ ├── 7 │ │ 0000000000000000233.wal ├── 8 │ └── 0000000000000000233.wal │ │ └── 9 │ │ └── 0000000000000000233.wal │ └── SNAPSHOT_2019_12_04_10_54_44 │ ├── data │ │ ├── 000006.sst │ │ ├── 000008.sst │ │ ├── 000009.sst │ │ ├── CURRENT │ │ ├── MANIFEST-000007 │ │ └── OPTIONS-000005 │ └── wal │ ├── 1 │ └── 0000000000000000236.wal │ ├── 2 │ │ └── 0000000000000000236.wal │ ├── 3 │ │ └── 0000000000000000236.wal │ ├── 4 │ │ └── 0000000000000000236.wal │ ├── 5 │ │ └── 0000000000000000236.wal │ ├── 6 │ │ └── 0000000000000000236.wal │ ├── 7 │ │ └── 0000000000000000236.wal │ ├── 8 │ │ └── 0000000000000000236.wal │ └── 9 │ └── 0000000000000000236.wal ├── data3 processing Logic Analysis 3.1 Logic Analysis

Create snapshot is triggered by client api or console, and graph server parses the AST of create snapshot, and then sends the creation request to meta server through meta client. When meta server receives a request, it first gets all the active host and creates the request required by adminClient. The creation request is sent to each StorageEngine through adminClient. After receiving the create request, StorageEngine traverses all the StorageEngine of the specified space, creates a checkpoint, and then hardlink the wal of all the partition in the StorageEngine. When creating checkpoint and wal hardlink, the database is read-only because write blocking requests have been sent to all leader partition in advance.

Because the name of the snapshot is automatically generated by the system's timestamp, you don't have to worry about the duplicate name of the snapshot. If you create an unnecessary snapshot, you can delete the created snapshot through the drop snapshot command.

3.2 Create Snapshot3.3 reate Checkpoint4 key code to achieve 4.1 Create Snapshotfolly::Future AdminClient::createSnapshot (GraphSpaceID spaceId, const std::string& name) {/ / get all storage engine's host auto allHosts = ActiveHostsMan::getActiveHosts (kv_); storage::cpp2::CreateCPRequest req; / / specify spaceId. Currently, checkpoint,list spaces work for all space has been performed in the calling function. Req.set_space_id (spaceId); / / specify snapshot name. Meta server has been generated according to the timestamp. / / for example: SNAPSHOT_2019_12_04_10_54_44 req.set_name (name); folly::Promise pro; auto f = pro.getFuture (); / / send requests to all storage engine through the getResponse interface. GetResponse (allHosts, 0, std::move (req), [] (auto client, auto request) {return client- > future_createCheckpoint (request);}, 0, std::move (pro), 1 / * The snapshot operation only needs to be retried twice*/); return f;} 4.2 Create CheckpointResultCode NebulaStore::createCheckpoint (GraphSpaceID spaceId, const std::string& name) {auto spaceRet = space (spaceId); if (! ok (spaceRet) {return error (spaceRet)) } auto space = nebula::value (spaceRet); / / traversing all StorageEngine for belonging to this space (auto& engine: space- > engines_) {/ / first do checkpoint auto code = engine- > createCheckpoint (name) to StorageEngine; if (code! = ResultCode::SUCCEEDED) {return code } / / then do hardlink auto parts = engine- > allParts () for all partition last wal in this StorageEngine; for (auto& part: parts) {auto ret = this- > part (spaceId, part); if (! ok (ret)) {LOG (ERROR) create snapshot Execution succeeded (Time spent: 22892 / us 23923) (user@127.0.0.1) [default_space] > create snapshot;Execution succeeded (Time spent: 18575 / 19168 us)

Let's take a look at the existing snapshots with the SHOW SNAPSHOTS command mentioned in 5. 3

From the above SNAPSHOT_2019_12_04_10_54_36, we can see that the snapshot name is related to timestamp.

5.2 DROP SNAPSHOT

DROP SNAPSHOT deletes the snapshot with the specified name. You can get the name of the snapshot through the SHOW SNAPSHOTS command. DROP SNAPSHOT can delete either a valid snapshot or a failed snapshot.

Syntax:

DROP SNAPSHOT name

The author deletes the successfully created snapshot SNAPSHOT_2019_12_04_10_54_36 and uses the SHOW SNAPSHOTS command to view the existing snapshot.

SHOW SNAPSHOTS can view all the snapshot in the cluster, and you can use the SHOW SNAPSHOTS command to see its status (VALID or INVALID), name, and the ip address of all storage Server when the snapshot was created.

Syntax:

SHOW SNAPSHOTS

Here is a small example:

When the system structure changes, it is best to create snapshot immediately, such as add host, drop host, create space, drop space, balance and so on.

The user-specified snapshot path is not available in the current version, and the snapshot will be created in the data_path/nebula directory by default.

The current version does not provide the recovery function of snapshot, which requires users to write shell scripts according to the actual production environment. The implementation logic is also relatively simple, copy the snapshot of each engineServer to the specified folder, set this folder to data_path, and start the cluster.

After reading the above, do you have any further understanding of how the cluster snapshot practice in the distributed map database Nebula Graph works? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.