The arrangement of HBase Notes (1) 07/02 Update SLTechnology News&Howtos

The arrangement of HBase Notes (1)

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

HBase note collation (1) determinant database

Row database:

It can be simply understood as similar to the traditional rdbmspaint data, the data stored are structured data. Row database is good for scanning the whole table data, but not for querying only individual fields.

Column database:

An improvement to the row database is that some columns (or related columns) are stored in separate files, and other columns are stored in multiple other files. When we query, we only need to read out these commonly used columns to complete the work. In this way, the reading and writing of the file IO is reduced, and the efficiency of reading and writing is improved (no longer need to scan the full table in the row database. Then filter related fields) in the row database, there is a very famous product in the field of big data-HBase, which is different from the traditional RDBMS and is called column database, or NoSQL (Not Only SQL), which is a general term for a class of databases, such as Hbase, Redis, mechache, mongodb. No, no, no. A piece of data in. It can satisfy the reading and writing of the huge amount of data on hdfs. HBase Overview is a highly reliable, high-performance, column-oriented and scalable distributed storage system. Large-scale structured storage clusters can be built on cheap PC Server by using HBase technology. HBase uses Hadoop HDFS as its file storage system, Hadoop MapReduce to deal with massive data in HBase, and Zookeeper as a coordination tool.

Features:

High reliability and high performance column scalable table oriented vertical scale and scale deployment: distributed cluster HBase is originally designed for large tables in enterprises, facing millions of columns and tens of billions of records. It can be distributed to store a large amount of data with strong fault tolerance and high reliability. HBase is a column NoSQL database data storage structure is stored according to the column. Database products that are stored by column generally have the concept of row keys. Using row keys, you can mark a row of data. When you understand row keys, you can simply think of it as a competition in RDBMS. The physical structure of the data stored in Hbase is in the form of key-value. Key is the row key. At the same time, it is very convenient to scale out (scale out, vertical expansion scale up). HBase installation

Make sure that hadoop, zookeeper and java are installed before installation.

Extract the stand-alone version ~] $tar-zxf / home/uplooking/soft/hbase-1.1.5-bin.tar.gz-C / home/uplooking/app rename ~] $mv / home/uplooking/app/hbase-1.1.5 / home/uplooking/app/hbase add to the environment variable export HBASE_HOME=/home/uplooking/app/hbase configuration $HBASE_HOME/conf/hbase-env.sh, Hbase-site.xml $HBASE_HOME/conf/hbase-env.sh export JAVA_HOME=/opt/jdk export HBASE_MANAGES_ZK=false $HBASE_HOME/conf/hbase-site.xml hbase.rootdir hdfs://ns1/hbase hbase.cluster.distributed true hbase.zookeeper.quorum uplooking01 Uplooking02,uplooking03 starts sh $HBASE_HOME/bin/start-hbase.sh using the jps command When HMaster, HQuorumPeer (using zk that comes with hbase) and HRegionServer are started, the hbase service has been started successfully. Sh $HBASE_HOME/bin/stop-hbase.sh single process starts HMaster hbase-daemon.sh start master HRegionserver hbase-daemon.sh start regionserver access: web http://:16010 cli bin/hbase shell distributed installation is based on the above. You only need to configure one more conf/regionservers and add two lines: uplooking02uplooking03 Note: if you have already configured the stand-alone version, you need to clear the directory of hbase on hdfs and the directory of hbase in zk To avoid conflicts with cluster version operations, zk rmr / hbase hdfs hdfs dfs-rm-R / hbase copies the data from master to uplooking02 and uplooking03 scp-r app/hbase uplooking@uplooking02:/home/uplooking/app/ scp-r app/hbase uplooking@uplooking03:/home/uplooking/app/ also adds relevant environment variables scp ~ / .bash_profile uplooking@uplooking02:/home/uplooking/ scp ~ / .bash_profile uplooking@uplooking02:/ to slave01 and slave02. Home/uplooking/ makes it effective source ~ / .bash_profile starts the hbase cluster sh $HBASE_HOME/bin/start-hbase.sh on the master machine There is a process HMaster, and there is a HRegionServer startup HBase problem and solution on uplooking02 and uplooking03 respectively.

The following problems occurred when starting hbase:

Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: ns1 at org.apache.hadoop.security.SecurityUtil.buildTokenService (SecurityUtil.java:373) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy (NameNodeProxies.java:258) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy (NameNodeProxies.java:153) at org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:602) at org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:547) At org.apache.hadoop.hdfs.DistributedFileSystem.initialize (DistributedFileSystem.java:139) at org.apache.hadoop.fs.FileSystem.createFileSystem (FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200 (FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal (FileSystem.java:2625) at org.apache.hadoop.fs.FileSystem$Cache.get (FileSystem.java:2607) at org.apache.hadoop.fs.FileSystem.get (FileSystem. Java:368) at org.apache.hadoop.fs.Path.getFileSystem (Path.java:296) at org.apache.hadoop.hbase.util.FSUtils.getRootDir (FSUtils.java:1002) at org.apache.hadoop.hbase.regionserver.HRegionServer. (HRegionServer.java:566)... 10 moreCaused by: java.net.UnknownHostException: ns1

Solution:

The first way: source the environment variable file. The second way: give the hdfs-site.xml and core-site.xml corresponding to hdfs to hbase management.

In addition, it should be noted that if the stand-alone version is already installed, if you install the cluster version again, you need to delete the original related data.

HBase architecture

Logical structure:

The concept of table dividing data sets is the same as the concept of tables in traditional db. RowKey: the only indication of a row of data. If you want to read/write a piece of data, you must use the row key, which is stored in a byte array at the bottom of hbase, so it is convenient for us to use rk to sort. Row keys are byte arrays, and any string can be used as row keys. Rows in the table are sorted according to row keys, and data are sorted and stored according to the byte order (byte order) of Row key. All access to the table is through row keys (single RowKey access, or RowKey scope access, or full table scan). A columnFamily is simply thought of as a collection of "columns". Column families are stored in separate files. Column qualifier (column Qualifier), or column. Column data positioning through the column qualifier each CF can have one or more column members (ColumnQualifier), column members do not need to be given in the table definition, new column family members can then be added on demand and dynamically. Timestamps (version) can hold multiple versions of data in a cell. Cell (cell) Cell by row keys, column family: qualifier, timestamp unique decision, Cell data is no type, all in the form of bytecode storage, mainly used to store data.

The diagram of the cell is as follows:

Physical structure:

HMaster-> NameNode Management Node HRegionServer- > DataNode Server HRegion that stores Region data can be simply understood as a table, storing part of the data in a table. When the data in the region exceeds a certain amount, it will automatically split into two region (one for two). From this point of view, Region is a horizontal division of the table in hbase. A physical structure before HFile stores data on the hdfs to receive data submitted from the client. There are multiple HRegionServer in a cluster |-one HLog |-multiple HRegion |-multiple Store |-one CF

The physical structure of HBase is shown below:

HBase operation

CLI (Command Line interface):

Use bin/hbase shell to enter the command terminal command: list to view all tables under the current namespace, or to view tables under a specific namespace list 'ns:abc.*'-- > View the list of all tables under the namespace ns that start with abc. Create a table named T1,' cf1'-> create a table named T1 under the default namespace, with only one column family The column family is named cf1 to view all the contents of a table: scan scan 't1' or scan 'ns1:t1' add a record to the table: put put 't1records,' 1' (rowkey), 'cf1:name',' zhangsan' to view one of the specific values get 't1entries, '1values,' cf1:name' view table attribute information: describe/desc 't1' delete record: delete delete 't1records,'1' 'cf1:age'-- > Delete the cells corresponding to the cf1:age of a certain rowkey, deleteall' t1cells,'2'-- > Delete all the cells corresponding to the rowkey=2 delete a table: note: before deleting the table You need to confirm whether the table status is disable. If not, you need disable 'table name' disable 't1' drop' t1'.

Exercise:

Rk column column cf name grad course math art | column1 Tom 597 872 Jim 489 80 create table create 'stu','name',' grad','course'-- > create table stu, there are three column families, name, grad, course add data: put 'stu',' 1,': name', 'Tom' can also be written directly as' name' That is to say, there are no multiple columns of put 'stu',' 1percent,': grad','5' put 'stu',' 1percent, 'course:art',' 97' put 'stu',' 1percent, 'course:math',' 88' to delete the art scores of name= "Jim" delete 'stu',' 2percent, 'name',' Jim', "course:art"-- > wrong delete 'stu'. '2percent, "course:art" because every operation You can only operate on a single cell, hbase's atomic operation is based on the cell, and the determination of a cell is made by rk, cf, col, ts (timestamp) to delete the row where name= "JIM" is located, and all cells deleteall 'stu',' 2' to see how many records there are in the current table: select count (1) from t The java API operation test code package com.uplooking.bigdata.hbase;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.*;import org.apache.hadoop.hbase.client.*;import org.apache.hadoop.hbase.filter.CompareFilter;import org.apache.hadoop.hbase.filter.Filter;import org.apache.hadoop.hbase.filter.FilterList;import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;import org.junit.After;import org.junit.Before;import org.junit.Test of count HBase Import java.io.IOException;import java.util.ArrayList;import java.util.Arrays;import java.util.List;/** * HBase Java API Learning * / public class HBaseAPIOps {private Connection connection; private Admin admin; @ Before public void setUp () throws Exception {Configuration conf = HBaseConfiguration.create (); connection = ConnectionFactory.createConnection (conf); admin = connection.getAdmin () } / * list 'default:t.*' TABLE T1 T2 * / @ Test public void testList () throws IOException {TableName [] tblNames = admin.listTableNames ("default:t.*"); for (TableName tblName: tblNames) {System.out.println (tblName.getNamespaceAsString () + ":" + tblName.getNameAsString ()) } @ Test public void testCreate () throws IOException {HTableDescriptor desc = new HTableDescriptor ("T3"); HColumnDescriptor family = new HColumnDescriptor ("cf"); desc.addFamily (family); admin.createTable (desc);} @ Test public void testAddRecord () throws IOException {Table T3 = connection.getTable (TableName.valueOf ("T3")); byte [] cf = "cf" .getBytes () Byte [] nameBytes = "name" .getBytes (); byte [] ageBytes = "age" .getBytes (); List puts = new ArrayList (); / * Put put1 = new Put ("1" .getBytes ()); put1.addColumn (cf, nameBytes, "xiaofazeng" .getBytes ()); put1.addColumn (cf, ageBytes, "13" .getBytes ()); puts.add (put1) Put put2 = new Put ("2" .getBytes ()); put2.addColumn (cf, nameBytes, "xiaoshihao" .getBytes ()); put2.addColumn (cf, ageBytes, "15" .getBytes ()); * / puts.add (put2); for (int I = 1000; I {String name = new String (result.getValue ("cf" .getBytes (), "name" .getBytes () Int age = Integer.valueOf (new String (result.getValue ("cf" .getBytes (), "age" .getBytes ()); String rowKey = new String (result.getRow ()); System.out.println (rowKey + "\ t" + "cf:name-- >" + name + ", cf:age-- >" + age);}); table.close () } / * * conditional query * is actually the where condition in sql. Add a filter * @ throws IOException * / @ Test public void testQueryByCondtion () throws IOException {Table table = connection.getTable ("T3"); Scan scan = new Scan () to the hbase program. Filter filter1 = new SingleColumnValueFilter ("cf" .getBytes (), "age" .getBytes (), CompareFilter.CompareOp.GREATER_OR_EQUAL, "13" .getBytes ()) Filter filter2 = new SingleColumnValueFilter ("cf" .getBytes (), "age" .getBytes (), CompareFilter.CompareOp.LESS_OR_EQUAL, "18" .getBytes ()); FilterList filterList = new FilterList (); filterList.addFilter (filter1); filterList.addFilter (filter2); scan.setFilter (filterList); ResultScanner resultScanner = table.getScanner (scan) ResultScanner.forEach (result-> {String name = new String (result.getValue ("cf" .getBytes (), "name" .getBytes ()); int age = Integer.valueOf (result.getValue ("cf" .getBytes (), "age" .getBytes (); String rowKey = new String (result.getRow () System.out.println (rowKey + "\ t" + "cf:name-- >" + name + ", cf:age-- >" + age);}); table.close ();} @ After public void cleanUp () throws IOException {admin.close (); connection.close () }} HBase-related maven depends on UTF-8 2.1.0 2.6.4 1.2.1 1.1.5 junit junit 4.12 org.apache.hbase hbase-client ${hbase-version} org.apache.hbase hbase-server ${hbase-version} org.apache.hive hive-hbase-handler ${hive-api.version} org.apache. Maven.plugins maven-compiler-plugin 2.3.2 UTF-8 1.8 1.8 true maven-assembly-plugin jar-with-dependencies com.uplooking.bigdata.hbase.HBase2HDFSOps Make-assembly package single

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.