What's the use of HBase snapshots? 07/15 Update SLTechnology News&Howtos

What's the use of HBase snapshots?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what is the use of HBase snapshots for you. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

What is a snapshot?

A snapshot is a collection of meta-information that allows the administrator to revert to the previous state of the table. The snapshot is not a copy of the table but a list of file names, so the data is not replicated.

Full snapshot restore refers to the restore to the previous "table structure" and the data at that time, and the data that occurs after the snapshot will not be restored. A snapshot is a collection of metadata information that allows an administrator to restore a table to the state it was in when the snapshot was taken. A snapshot is not a copy of a table. The easiest way to think of it is to think of it as a collection of tracking metadata (table information and fields) and data (HFiles, memory storage, WALs). There is no action to copy the data in the snapshot operation.

Offline snapshots: the easiest scenario to discuss snapshots is when a table is closed. Closing a table means that all data is written to disk and no read or write operations are allowed. In this case, taking a snapshot simply takes the table metadata and maintains an association with the HFiles on disk. The time it takes for the primary node to perform the operation depends largely on the time it takes for HDFS's namenode to provide a list of files.

Online snapshots: in most cases, tables are open, and each domain server constantly processes put and get requests. In this case, the primary node receives a snapshot request and requires each domain server to generate a snapshot for the domain for which it is responsible.

The communication between the master node and the domain server is through Apache ZooKeeper, using a similar two-phase commit transaction. The master node establishes a "prepare snapshot" znode. Each domain server processes the request and prepares a snapshot of the domain for which the specified table is within its responsibility. Once the preparation is complete, a child node is established under the znode preparing the request, which means "ready to complete".

Once all domain servers have reported their status, the master node establishes another znode for "submit snapshot"; each domain server completes the snapshot and reports the status before joining the node. Once all domain servers report completion, the master node completes the snapshot and marks the operation as complete. If a domain server reports a failure, the master node creates a new znode to broadcast the abandon message. As the domain server continues to process new requests, different consistency models may be required in different use case cases. For example, some people may be interested in incomplete snapshots that do not contain the latest data in memory, while others want to lock writes to get a fully consistent snapshot and other possibilities.

Therefore, the program that generates snapshots on the domain server is pluggable. The only implementation now is "Flush Snapshot", which is a write operation (flush) to ensure column consistency before the snapshot is generated, and other programs using different consistency policies may be implemented in the future. The time it takes to generate a snapshot online depends on implementing the snapshot operation and returning the success status to the slowest domain server on the primary node, which is done in almost a matter of seconds.

The role of snapshots

The way to back up or clone a table that exists in HBase is to copy / export the table or to copy all the HFile in the HDFS after the table is closed. Copying or exporting scans and copies tables through a series of tool calls to MapReduce, which has a direct impact on RegionServer. Closing the table stops all read and write operations, which is often unacceptable in the real world.

By contrast, HBase snapshots allow administrators to clone a table without copying data, which has minimal impact on domain servers. Exporting a snapshot to another cluster does not directly affect any server; the export is just inter-group data synchronization with some extra logic.

Snapshot advantage

In addition to better consistency between exported snapshots and replication / export tables, the main difference is that exported snapshots operate at the HDFS level. This means that HMaster and domain servers are independent of the operation. Therefore, there is no need to create cache space for unnecessary data, and there is no scanning process. Because of the GC pause caused by a large number of object creation, the main performance impact on HBase is the additional network and disk load of DataNode.

Application scenario

Restore from user / application exception

Restore / restore from a known security state

View previous snapshots and selectively merge different write production environments

Save the snapshot when the main application is upgraded or modified.

Review and / or report data at a specified time.

Capture monthly data according to regulations

Generate a day-end / month-end / quarter-end report

Application testing

Through snapshots to simulate the changes of the structure or application in the production environment, the test can be discarded after completion. For example: generate a snapshot, use the contents of the snapshot to build a new table (original structure + data) and modify the structure of the new table, add or delete columns, and so on. (the original table, snapshot, and new table remain independent of each other)

Reduce work pressure

Take a snapshot, import it to another cluster, and then run MapReduce jobs. Because the exported snapshot is at the HDFS level, it does not reduce the efficiency of the HBase primary cluster as it does with replicated tables.

Understanding HBase tables

The HBase table contains a series of metadata information and collections of key-value pairs.

Table information: a list file that describes "settings", such as column families, compression types and encodings, bloom filter types, etc.

Domain: the "partition" of the table is called a domain. Each field is responsible for managing a continuous set of key values by defining start and end keys.

WALs/MemStore: before the data is written to disk, put writes to the Write Ahead Log-WAL and keeps it in memory until memory pressure triggers writing to disk. WAL provides an easy way to recover put operations that are not written to disk because of exceptions.

HFiles: at some point all the data is written to disk. HFile is the file format in which HBase stores key-value pairs. HFile is unchanged, but can be deleted when merging or domain deletion.

Snapshot operation

Take a snapshot:

This operation attempts to take a snapshot of the specified table. If the cluster performs operations such as data equalization, separation, or merge, it may cause the operation to fail.

Clone Snapshot:

This operation builds a new table using the same structured data as the specified snapshot. The result of the operation is a fully functional table, and any changes to the table will not affect the original table or snapshot.

Restore the snapshot:

This operation restores the table structure and data to the state they were in when the snapshot was taken. (note: this operation discards any changes after the snapshot is generated.)

Delete a snapshot:

This operation deletes snapshots in the system, frees up unshared disk space, and does not affect other clones or snapshots.

Export the snapshot:

This operation replicates snapshot data and metadata to other clusters. The operation will only involve HDFS and will not have any contact with HMaster or RegionServer, so the HBase cluster can be shut down.

Current restrictions

Snapshots depend on some conditions, and there are currently some tools that do not integrate new features well:

Merging clusters that reference snapshots can cause data loss of snapshots and clone tables.

Restoring a table when replication is on will cause the two clusters to be out of sync and the table will not be restored on the replication set.

Confirm that the snapshot license is turned on by checking that hbase.snapshot.enabled in hbase-site.xml is set to true. The backup of HBase data in the past is based on tools such as distcp or copyTable. These backup mechanisms more or less affect the current online data reading and writing. Snapshot provides a fast data backup method without data copy.

Hbase.snapshot.enabled

True

Snapshot operation types include online and offline

The offline method is disabletable, and HBase Master traverses table metadata and hfiles in HDFS to establish references to them.

Based on the snapshot file, you can clone a new table and restore,export to another cluster; the new table generated by clone is just adding metadata, and the related data file is still reusing the data file specified by snapshot. See the operation diagram of the new clone table:

Snapshot reference instruction

1. Use the snapshot command to take a snapshot of the specified table (no file replication is generated)

Hbase > snapshot 'tableName','snapshotName'

2. List all snapshots and use the list_snapshot command. Displays the snapshot name, source table, and creation date and time

Hbase > list_snapshots

3. Use the deleted_snapshot command to delete a snapshot. Deleting a snapshot does not affect the clone table or subsequent snapshots.

Hbase > delete_snapshot 'snapshotName'

4. Use the clone_snapshot command to generate a new table (clone) from the specified snapshot. Since there will be no data replication, the final data will not be used twice as much as before.

Hbsse > clone_snapshot 'snapshotName','newTableName'

5. Use the restore_snapshot command to replace the current table structure or data with the specified snapshot content

Hbase > restore_snapshot 'snapshotName'

To restore data with a snapshot, it needs to disable the table before restoring

Hbase > disable 'myTable'

Hbase > restore_snapshot 'myTableSnapshot-122112'

Tip: because the backup (replication) is at the Syslog level and the snapshot is at the file system level, after using the snapshot to restore, the replica will be in a different state from master, if you need to use restore, you have to stop the backup and reset the bootstrap. If the data is lost due to incorrect client behavior, and the full table recovery requires the table to be disabled, you can use a snapshot to generate a new table, and then copy the required data from the new table to the main table with map-reduce.

6. Use the ExportSnapshot tool to export existing snapshots to other clusters. The export utility does not affect the domain server load, but works at the HDFS level, so you need to specify the HDFS path (the hbase root of other clusters). This operation should be performed using hbase's account, and there should be a temporary directory established by hbase's account in hdfs (hbase.tmp.dir parameter control)

Hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot-snapshotName-copy-to hdfs: / hbfreeoa2:8082/hbase

Use 16 mappers to copy a snapshot named MySnapshot to a cluster called hbfreeoa2

Hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot-snapshot MySnapshot-copy-to hdfs://hbfreeoa2:8020/hbase-mappers 16

Zero replication snapshot, restore, clone

The main difference between snapshots and copy / export tables is that snapshot operations only write metadata and do not involve large amounts of data. One of the main design principles of HBase is that once the file is written, it will not be modified. The immutability of the file means that the snapshot only needs to keep track of the file that existed when the snapshot was generated, and is responsible for reminding the system that the file should not be deleted but should be archived during compression.

The same principle applies to clone and restore operations. Because the file is immutable, you can create a new table simply by "linking" to the file reference through the snapshot. Exporting snapshots is the only operation that needs to copy data because other clusters do not have data files.

Export snapshots and replication / export tables

In addition to better consistency between exported snapshots and replication / export tables, the main difference is that exported snapshots operate at the HDFS level. This means that the Master and domain servers are independent of the operation. Therefore, there is no need to create cache space for unnecessary data, and there will be no GC pauses caused by the scanning process due to the creation of a large number of objects. The main performance impact for HBase is the additional network and disk load of DataNode.

Clone the table from the snapshot

When the administrator performs a clone operation, the new table is created according to the table structure in the snapshot and split by the start / end key in the snapshot domain information. Once the table metadata is established, it can be used in the same way as snapshots without having to copy the data. Because HFiles is immutable just a reference to the established source file, this avoids data copying and allows the clone to modify without affecting the source table or mirror. The clone operation is performed by the primary node.

Restore a table from a snapshot

A restore operation is similar to a clone operation. You can think of it as deleting the table and then cloning it from the snapshot. The restore operation restores the old data in the snapshot and deletes the data that does not exist in the snapshot, and the table structure is restored to the same as the snapshot. At the bottom, the restore operation is achieved by comparing the difference between the table state and the snapshot, removing files that do not exist in the snapshot and adding file associations that are in the snapshot but not in the current table state. The same table structure is modified to the state at the time of snapshot generation. The restore operation is performed by the primary node and the table is closed.

This is the end of this article on "what is the use of HBase snapshots?". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.