Cluster Construction of hbase 07/06 Update SLTechnology News&Howtos

Cluster Construction of hbase

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Cluster building 1. Pre-requirements:

-HBASE is written in Java, so when you install HBASE, you must install jdk

-hadoop platform is required to install HBASE

The versions of -HBASE and jdk are compatible, so be sure to pay attention to them.

Jdk:

Hadoop:

two。 Cluster planning:

3. Specific construction:

① upload installation package (hbase-1.2.6-bin.tar.gz)

② decompression: tar zxvf hbase-1.2.6-bin.tar.gz-C / application/

③ configuration environment variables:

Export HBASE_HOME=/application/hbase-1.2.6export PATH=$PATH:$ZOOKEEPER_HOME/bin:$SQOOP_HOME/bin:$HBASE_HOME/binsource / etc/profie # refresh the configuration file hbase version # to see if the configuration is successful

④ modifies the configuration file of hbase

Cd / application/hbase-1.2.6/conf

Hbase-env.sh:

Export JAVA_HOME=/application/jdk1.8 # configure jdkexport HBASE_MANAGES_ZK=false # configure your own zookeeper#psHBASE must rely on the zookeeper,zookeeper function is the addressing entry of the storage HBASE HBASE comes with a stand-alone version, all need to be enabled to use your own

Hbase-site.xml:

Hbase.rootdir hdfs://zzy/hbase hbase.cluster.distributed true hbase.zookeeper.quorum hadoop01:2181,hadoop02:2181,hadoop03:2181

Regionservers:

Hadoop01hadoop02hadoop03

Backup-masters (created by yourself):

# configure the node hadoop02 of the backup primary node

⑤ wants to put hadoop's hdfs-site.xml and core-site.xml under hbase-1.2.6/conf.

Because the hadoop cluster is in HA mode, you need to configure:

Cp / application/hadoop-2.7.6/etc/hadoop/core-site.xml .cp / application/hadoop-2.7.6/etc/hadoop/hdfs-site.xml.

⑥ distribution and installation to each node

Cd / application/scp-r hbase-1.2.6 hadoop02:$PWDscp-r hbase-1.2.6 hadoop03:$PWD

⑦ time synchronization:

HBase cluster has more strict requirements for time synchronization than HDFS, so be sure to remember to synchronize time before cluster startup, with a difference of no more than 30s. Configure a scheduled update time in a scheduled task.

4. Start the cluster:

① first zookeeper the cluster at each node: zkServer.sh start

② starts hdfs cluster: start-dfs.sh

③ startup hbase:start-hbase.sh / stop-hbase.sh is best started on the master node, which node is hmaster.

④ check whether it starts normally: jps

Web interface access: http://HBASE master node: 16010

Note: if the corresponding process of a node is not started, you can start it manually:

Hbase-daemon.sh start master hbase-daemon.sh start regionserver2. Cluster architecture

is in the HBASE cluster: there are multiple masters (hmaster) and multiple slaves (RegionServer). Each slave node stores multiple Region, and each Region is separated from a table of HBASE (the default Region size is 10G).

Introduction to cluster roles:

1. Region:

is a logical unit in which HBASE cuts all data in a table according to different ranges of RowKey. Each region is responsible for reading and writing access to a certain range of data. Region is managed by RegionServer. The concept of region in HBASE is similar to the concept of data blocks in hdfs. Region is a slice cut out of the HBase table. The size of Region is 1G in version 1.x and 10G in version 2.x.

How Region works:

when the client sends a command (delete/put), when the region receives the request, it will first update the data to memory and record the operation in the log (append). No matter what operation is done, it will be recorded in the operation log for data recovery, because when region receives the request, it will only synchronize the data to memory. If the node goes down, the data will be lost. So data recovery (or persistence) is done through logging. Of course, the log record will be split when it reaches a certain size, and the split is called Hfile. Finally, when there are many Hfile, the log will be merged, and these Hfile will be merged into one storeFile. When merging, the delete and put in the log will be offset and deleted (that is, a put operation. If there is a corresponding delete operation, the two records will be offset, and will not be recorded during the merge). Eventually, there will be only put operations and no delete operations in the merged storefile.

2. Hmaster:

The master node of HBASE is responsible for state awareness, load distribution, and schema management of user tables throughout the cluster (multiple can be configured to implement HA). Only hmaster has the right to modify metadata.

In fact, the HMaster of HBase can provide services normally even if it is down for a period of time, because when the HMaster is down, the cluster can still query but cannot read or write.

Load of HMaster: when the table is created for the first time, there is only one Region, and then when the value of region exceeds 1G, the HMaster of HBASE will split the region (if it is 2G, one is two, each is 1G), and the split region may not exist on a node. Since HBASE is built on top of hadoop, a copy of region will operate on hdfs.

Online and offline perception of hmaster nodes: if a Reginaservers goes down, hmaster will automatically copy the region stored in the down node on another node through other copies (based on hdfs copies). When the failed node recovers, hmaster will load again to ensure that the number of region in each Reginaservers is roughly the same. (only the number of region in the cluster can be loaded if the size of the region is different. The load cannot be based on the size of the data.

3.RegionServer:

The server in HBase that is really responsible for managing Region is the server responsible for reading and writing table data for the client. Each RegionServer will manage a lot of Region, and all the region managed on a RegionServer do not belong to the same table. Responsible for the split of Region, the storage interaction with the underlying HDFS, and the merger of StoreFile.

4. ZooKeeper

the coordination of master and slave nodes in the whole HBase, the entry of metadata, the election between master nodes, the awareness of up and down lines between cluster nodes. It is all realized through ZooKeeper.

5. Client

Client includes interfaces to access HBase, and Client also maintains a corresponding Cache to speed up HBase access, such as Cache's. META. Information about metadata. Cache the specific location of the rowkey of the previous query to facilitate a quick query.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.