Environment configuration and Application method of HBase 07/03 Update SLTechnology News&Howtos

Environment configuration and Application method of HBase

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "the environment configuration and application method of HBase". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. introduction of HBase 1.1 introduction

Hbase is an open source copycat version of bigtable. It is a database system based on hdfs, which provides high reliability, high performance, column storage, scalability, real-time read and write.

It is between nosql and RDBMS, can only retrieve data through primary key (row key) and primary key range, and only supports single-row transactions (complex operations such as multi-table join can be realized through hive support). It is mainly used to store unstructured and semi-structured loose data. Like hadoop, the Hbase goal relies mainly on scale-out, increasing computing and storage capacity by increasing the number of cheap commercial servers.

HBase stores data as a table. The table consists of rows and columns. Columns are divided into several column families (row family).

1.2 comparison between Hbase and traditional databases

Let's first look at the tables in a traditional relational database:

Then compared with the table of HBase, the table structure of hbase is quite different from that of traditional relational database.

1. A table is divided into several region by row, and each region is assigned to a specific regionserver management

2. Within each region, there is also a column family divided into several HStore

3. The data in each HStore will fall into several HFILE files.

4. The volume of region will increase with data insertion, and regret splitting at a certain threshold.

5. With the split of region, more and more region will be managed on a regionserver.

6. HMASTER will do load balancing according to the number of region managed on regionserver.

7. The data in region has an in-memory cache: memstore, and data access is preferred in memstore.

8. Due to the limited space, the data in memstore need to be periodically flush to the file storefile, and each flush generates a new storefile.

9. The number of storefile will increase over time, and regionserver will merge a large number of storefile on a regular basis.

The design of row keys has a great influence on the efficiency of data query.

HBase has good scalability: if the storage capacity is not enough, add datanode or regionservers directly.

Hbase can be used as the underlying system function of an online system.

Hmaster can do load balancing and monitor the data storage between nodes.

Each store (column family) has an in-memory cache for some of the hottest data (recently accessed), which makes it much faster to read data.

All the documents are indexed, so it will be easier to check.

Region conducts periodic merge operations in storefile.

Third, build the HBase environment

1. First of all, download a HBase installation file: http://hbase.apache.org/, and then extract it to the directory you need to install. If you have learned hbase, I am sure you will know all about the basic installation.

2. Find hbase-env.sh, hbase-site.xml and regionservers in the conf directory under the habse directory, and then configure them as follows. The whole configuration process is very simple.

In hbase-env.sh, the main thing is to configure the environment variables of java, and to enable the zookeeper function. Here, we want to change the default true to false, which means to enable zookeeper, but not the zookeeper that comes with hbase, but the zookeeper that I installed myself.

Export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 export HBASE_MANAGES_ZK=false

In hbase-site.xml, the main thing is to configure the host address of hdfs, and the following ubuntu1,2,3 is the host name of zookeeper, port 2181, which can be configured by different machines as appropriate.

Configuration > hbase.rootdir hdfs://ubuntu2:9000/hbase hbase.cluster.distributed true hbase.zookeeper.quorum ubuntu1:2181,ubuntu2:2181,ubuntu3:2181

3. Finally, modify the regionservers to change the default localhost to the address of the host. This configuration file means to set the slave node, which is similar to the hadoop cluster we configured before, just like that salver.

Ubuntu1ubuntu2ubuntu3

4. Finally, copy the core-site.xml and hdfs-site.xml in hadoop to the conf directory of hbase.

5. Then send the configured file to the other two nodes through scp.

Finally, I just want to say that do not type too many letters in this configuration file, or you will report an error.

1. Start all hbase processes

Start the zk cluster first

. / zkServer.sh start

Start the hbase cluster

Start-dfs.sh

Start hbase and run on the primary node:

Start-hbase.sh

2. Access the hbase management page through the browser

192.168.44.131:60010

3. In order to ensure the reliability of the cluster, start multiple HMaster

Hbase-daemon.sh start master

The effect of jps on the primary node is to start HRegionServer and HMaster

We can check the startup through the web page: 192.168.44.131 60010, that is, your master node's ip or hostname + port number 60010 is fine.

4.2 create a table

The official examples are:

Examples: hbase > create 'ns1:t1',' F1, SPLITS = > ['10,'20,'30,'40'] hbase > create'T1,'f1, SPLITS = > ['10,'20,'30,'40] hbase > create'T1,'F1, SPLITS_FILE = > 'splits.txt', OWNER = >' johndoe' hbase > create'T1, {NAME = >'F1, VERSIONS = > 5} METADATA = > {'mykey' = >' myvalue'} hbase > # Optionally pre-split the table into NUMREGIONS, using hbase > # SPLITALGO ("HexStringSplit", "UniformSplit" >

So let's create a new user information table based on the example. The table is named user-info, contains two column families (base_info and extra_info), and retains three versions.

Create 'user-info', {NAME= >' base_info',VERSIONS= > 3}, {NAME= > 'extra_info'} 4.3 insert

The official statement is:

Hbase > put 'ns1:t1',' R1,'C1, 'value', ts1

So let's write according to its grammar:

Put 'user-info','rk-100001','base_info:name',' Zhang s' put 'user-info','rk-100001','base_info:age','20'put' user-info','rk-100001','base_info:address',' Hunan Changsha'

Hbase can only be inserted one by one, for example, we can only insert name at a time, so if we want to insert age,address, we need put one by one.

4.4 query

1. We can query it through scan:

Scan 'user-info'

We can see from the figure that it is sorted by key (the names of the fields are sorted by dictionary) k-value

If I insert another row,

Put 'user-info','rk100003','base_info:name','angelabby'

All field names + field values in a row are sorted by hbase when they are stored, sorted according to the dictionary order of K, all rows are stored sequentially, and sorted according to the dictionary order of rowkey.

This feature affects continuous storage.

2. Get to fetch data, only one row at a time

Get 'user-info','rk100003'4.5 modification

Three versions:

Put 'user-info','rk100003','base_info:name','yangying' put' user-info','rk100003','base_info:name','baobao'

View the values of previous versions:

Scan 'user-info', {VERSIONS= > 10}

4.6 Delete

You need to disable this table before you can drop it.

You need to disable this table before you can drop it. Disable 'user-info'drop' user-info' 5. Use HBase in eclipse

Open eclipse and import all packages in hbase/lib. Then you can happily start writing. Here's an example of creating tables and inserting data in the eclipse mean hbase:

/ / create the table. DDL operates public static void main (String [] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException {/ / Configuration conf=new Configuration (); / / loads the hbase-site.xml configuration file Configuration conf=HBaseConfiguration.create (); conf.set ("hbase.zookeeper.quorum", "ubuntu1:2181,ubuntu2:2181,ubuntu3:2181") HBaseAdmin admin=new HBaseAdmin (conf); TableName name = TableName.valueOf ("user-info"); HTableDescriptor tableDescriptor=new HTableDescriptor (name); / / create column name HColumnDescriptor base_info = new HColumnDescriptor ("base_info") / / add a version constraint base_info.setMaxVersions (3) to the column family; / / add the column family to the table description object tableDescriptor.addFamily (base_info); / / create an object admin.createTable (tableDescriptor) described by tabelDescriptor with the createTable method / / close the connection admin.close ();}

Finally, we can check whether the table has been built in the shell window of hbase. Enter list to query it.

Then insert the data:

@ Test / / insert data, belonging to DML operation public void Put () throws IOException {Configuration conf=HBaseConfiguration.create (); conf.set ("hbase.zookeeper.quorum", "ubuntu1:2181,ubuntu2:2181,ubuntu3:2181"); HTable hTable = new HTable (conf, "user-info") Put put=new Put (Bytes.toBytes ("rk-10001")); put.add ("base_info" .getBytes (), "name" .getBytes (), "wangming" .getBytes ()); put.add ("base_info" .getBytes (), "age" .getBytes (), "20" .getBytes ()); hTable.put (put) HTable.close ();}

Finally, we can check whether the table has been inserted in the shell window of hbase.

This is the end of the content of "Environmental configuration and Application of HBase". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.