Deep Analysis of hbase data Writing process 07/02 Update SLTechnology News&Howtos

Deep Analysis of hbase data Writing process

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Thursday, 2019-3-28

Deep Analysis of hbase data Writing process

Before looking at this link, you can write to view the write request process https://blog.51cto.com/12445535/2356085 in the detailed explanation of the hbase read and write request

Brief introduction:

Hbase was set up at the beginning to cope with a large number of applications that write more and read less. With its excellent write performance, it can easily support 10T of writes per day in a cluster of 100 RS.

The data writing process of hbase is generally divided into three parts.

1. Client write process

2. The writing process of the server

3. The working principle of wal

Let's review the data writing process of hbase first.

Summary of write request processing

1 client submits a write request to region server

2 region server find the target region

3 region checks whether the data is consistent with schema

4 if the client does not specify a version, get the current system time as the data version

5 write updates to WAL log

6 write updates to Memstore

7 to determine whether the flush of Memstore needs to be a Store file.

Part one: the writing process of the client

Client process resolution:

1. After the user submits the put request, the HBase client will add the put request to the local buffer. If certain conditions are met, the put request will be submitted asynchronously through AsyncProcess.

HBase defaults to autoflush=true, which means that the put request will be submitted directly to the server for processing

2. Users can set autoflush=false, so that the put request will be placed in the local buffer first, and will not be submitted until the local buffer size exceeds a certain threshold (the default is 2m, which can be configured through the configuration file). Obviously, the latter uses the group commit mechanism to submit requests, which can greatly improve write performance, but because there is no protection mechanism, the submitted requests will be lost if the client crashes.

/ / prompt:

Hbase in cdh cluster uses autoflush=false by default, that is, it will first put the data in the local buffer.

HBase client write buffer

Hbase.client.write.buffer = 2m / / write buffer size in bytes. Larger buffers require more memory in both the client and the server because the server instantiates the passed write buffer and processes it, which reduces the number of remote procedure calls (RPC). To estimate the amount of memory used by the server, multiply the value "hbase.client.write.buffer" by "hbase.regionserver.handler.count".

HBase Region Server processor count

Hbase.regionserver.handler.count = 30 / / number of RPC server instances started in RegionServer

3. Before submitting it to the server, HBase will display it in the metadata table. Meta. The region server to which they belong is found according to rowkey, and the process of positioning is obtained through the locateRegion method of HConnection. If it is a batch request, these rowkey will also be grouped by HRegionLocation, and each packet can correspond to a RPC request.

4. HBase constructs a remote RPC request MultiServerCallable for each HRegionLocation, and then through rpcCallerFactory. NewCaller () executes the call, ignoring the failed resubmit and error handling, and the client's commit operation ends.

Part II: server-side writing process

Server-side process analysis

(1) acquire row lock, Region update shared lock-"(2) start write transaction -" (3) write cache memstore-"(4) construct waledit and append hlog -" (5)

Release row lock, share lock-"(6) sync hlog -" (7) end write transaction-"(8) flush memstore

/ / explain

(1) acquire row locks and Region update shared locks: row locks are used in HBase to ensure that updates to the same row of data are mutually exclusive operations to ensure the atomicity of the updates, either successful or failed.

(2) start writing transaction: get write number, which is used to implement MVCC, realize unlocked reading of data, and improve read performance on the premise of ensuring read-write consistency.

(3) each column family in the write cache memstore:HBase corresponds to a store, which is used to store the column data. Each store has a write cache memstore, which is used to cache write data. HBase does not drop the data directly to the disk, but writes to the cache first, and then drops the disk together after the cache meets a certain size.

(4) Append HLog:HBase uses WAL mechanism to ensure data reliability, that is, write log first and then write cache. Even if downtime occurs, the original data can be restored by restoring HLog. This step is to construct the data as a WALEdit object and then write it sequentially to the HLog without the need to perform a sync operation at this time. Version 0.98 uses a new write thread mode to write HLog logs, which can greatly improve the performance of the whole data update, as shown in the next chapter.

(5) release row locks and shared locks

(6) when Sync HLog:HLog is really sync to HDFS, the sync operation is performed after the row lock is released to minimize lock holding time and improve write performance. If the Sync fails, a rollback operation is performed to remove the data that has been written to the memstore.

(7) end the write transaction: at this point, the update operation of the thread will be visible to other read requests and the update will actually take effect. For specific analysis, see the article "Database transaction Series-HBase Row-level transaction Model"

(8) flush memstore: when the write cache reaches 128m, the flush thread will be started to flush the data to the hard disk. The refresh operation involves HFile-related structures, which will be described in more detail later.

/ / HBase Memstore refresh size

Hbase.hregion.memstore.flush.size = 128m / / if the memstore size exceeds this value (in bytes), the Memstore will be flushed to disk. This value is checked by running threads at the frequency specified by hbase.server.thread.wakefrequency.

/ / prompt:

We need to note that when writing data on the server side, a lot of data is written to memstore and then to wal log, but this understanding is not very accurate, because it seems to violate the disaster recovery mechanism of wal log, so we can understand it as

The / / step of first writing to wal log and then writing to memstore is not fully reflected in the source code and can be understood as synchronous.

In theory, it should be written first in wal log. The implementation of HBase is to write mem first, and then write WAL,hbase to ensure that only these two are written before the user will be visible (mvcc mechanism). Moreover, if the mem is successfully written and the wal fails, the mem will be rolled back.

The reason for doing this it's ok is because MVCC ensures that a global incremental write num is created at the beginning of each write thread starting the transaction, but the global read point is not roll forward until the HLog update is complete.

So during this period, any reader thread uses the MVCC mechanism to read data according to the read point, and any write / update operation will not push forward the read point until the HLog has been updated, so even if the data has been written to the memstore, it is not visible to the reader thread.

Part III: analysis of WAL mechanism

1. WAL (Write-Ahead Logging) is an efficient logging algorithm, which is the only way to improve the write performance of almost all non-memory databases.

2. The basic principle is that before the data is written, the log is written sequentially, then the cache is written, and then the disk is set down after the cache is full.

3. The reason why write performance can be improved is that WAL converts a random write into a sequential write plus a memory write.

4. While improving write performance, WAL can ensure the reliability of data, that is, data will not be lost under any circumstances.

5. If downtime occurs after a write is completed, even if all the data in the cache is lost, the lost data can be restored through the recovery log.

WAL persistence level

In HBase, you can set the persistence level of WAL to determine whether to enable the WAL mechanism and how to uninstall HLog.

The persistence level of WAL is divided into the following four levels:

SKIP_WAL: write only the cache, not the HLog log. This approach can greatly improve write performance because it only writes to memory, but there is a risk of data loss. Setting this level is not recommended in practical application unless it is confirmed that the reliability of the data is not required.

ASYNC_WAL: writes data asynchronously to the HLog log.

SYNC_WAL: synchronously writes the data to the log file. It should be noted that the data is only written to the file system and is not actually on the disk.

FSYNC_WAL: synchronously writes data to the log file and forces the disk to be removed. The strictest log write level ensures that data will not be lost, but the performance is relatively poor.

USER_DEFAULT: by default, HBase uses SYNC_WAL level to persist data if the user does not specify a persistence level.

Users can set the WAL persistence level through the client, code: put.setDurability (Durability. SYNC_WAL)

/ / in cdh

WAL provider

Hbase.wal.provider = / / optional is: / / RegionServer is applied to the implementation of writing logs in advance.

RegionServer Default Group

Multiple HDFS WAL

Single HDFS WAL

HBase default setting (Single HDFS WAL)

WAL HSM storage policy

Hbase.wal.storage.policy

RegionServer Default Group

All copies are on SSD.

One copy is on SSD, and the other copies are on HDD

None (all on HDD)

For a further study of the concepts of wal and hlog, see the reference link.

/ / write model of HLog. HLog writes can be divided into three phases, first writing data pairs to the local cache, then writing the local cache to the file system, and finally performing sync operations to synchronize to disk.

Reference link:

Http://hbasefly.com/2016/03/23/hbase_writer/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.