Hbase Cluster deployment and testing (2017) 07/16 Update SLTechnology News&Howtos

Hbase Cluster deployment and testing (2017)

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Cluster that deploys hbase

First, we need a hadoop cluster, a cluster with at least one hdfs, and a zookeeper cluster

Use the availability number of the HA cluster, because the experiment, there is no need for so many clusters, then do not use the HA cluster

The first step is to see if the hdfs is normal.

Start hdfs

Start-dfs.sh

Start zookeeper to see if it works (start zookeeper manually on each machine)

. / zkServer.sh start

Check the working status of zookeeper

. / zkServer.sh status

We can enter

Hdfs dfsadmin-report to view the working information of the cluster

The preliminary work is ready, and the next step is to build the hbase.

First upload the hbase installation package (Note: here the hbase installation package must match the previously installed hadoop installation package to support the hadoop you installed)

The hadoop I installed is a 2-series class. When I download hbase, I also need to download a 2-series class.

Upload to the directory on the host

Extract tar-zxvf hbase filename-C app/

Docs documents can be deleted on the running machine to save space

Rm-rf docs/

We need to go to conf to modify the configuration file

Cd conf/

Modify 3 configuration files hbase-env.sh hbase-site.xml regionservers

First, let's modify hbase-env.sh.

Vi hbase-env.sh

Modify export JAVA_HOME=/usr/local/apps/jdk1.8.0_121 to the installation directory of the jdk you installed

Execute echo $JAVA_HOME on another machine to find the directory where jdk is installed.

Next, let's look for export HBASE_MANAGES_ZK=true (press ESC:/ZK on the keyboard to find it automatically)

Export HBASE_MANAGES_ZK=true (tell hbase whether he wants to manage the management instance of zookeeper or not. Because all kinds of components in the hadoop component need zookeeper, here we need to export HBASE_MANAGES_ZK=false, don't let hbase manage his own zookeeper. If hbase stops, he won't turn off our zookeeper.)

Next, modify hbase-site.xml (the main configuration file)

Vi hbase-site.xmlhbase.rootdirhdfs://hadoop-server-00:9000/hbasehbase.cluster.distributedtruehbase.zookeeper.quorumhadoop-server-00:2181,hadoop-server-01:2181,hadoop-server-02:2181 then modify regionservers vi regionservers to: hadoop-server-00hadoop-server-01hadoop-server-02 save exit

* * Note: we need to put the hdfs-site.xml and core-site.xml configuration files in hadoop under hbase/conf

Pwd / usr/local/apps/hbase-0.98.24-hadoop2/conf cp / usr/local/apps/hadoop-2.6.5/etc/hadoop/hdfs-site.xml. / cp / usr/local/apps/hadoop-2.6.5/etc/hadoop/core-site.xml. / * * next we need to copy the hbase information configured on 00 to the 01 02 machine [root@hadoop-server-00] Apps] # scp-r hbase-0.98.24-hadoop2/ hadoop-server-01:/usr/local/apps/ [root@hadoop-server-00 apps] # scp-r hbase-0.98.24-hadoop2/ hadoop-server-02:/usr/local/apps/

Before starting hbase, you must ensure that the hdfs cluster and zookeeper cluster are working properly.

To start hbase, first take a look at the script cd bin/ above in the bin directory to launch hbase. / start-hbase.sh under the bin directory (start on that machine, you will see Hmaster on that machine and other cluster machines will appear QuorumPeerMain) We need to start another Hmaster in the cluster to achieve high availability (HA), so you can do it on any machine. As long as you install the hbase installation package, you can change it to the bin directory and see hbase-daemon.sh. / hbase-daemon.sh statrt master (so there will be two Hmaster)

Hbase command line client is easy to use

There is a hbase in the bin directory of the hbase installation package

. / hbase shell

Start a client with an interaction like shell

After entering the database operation interface hbase (main): 001hbase 0 >, you can perform the command operation of the database.

You can't use mysql commands in it. He doesn't know it. He has his own set of grammar.

When you enter this for the first time, what you enter cannot be deleted back. We need to make a setting in session Option.

Click Emulation to select linux on the ditch in front of the Backspace sends delete in Mapped Keys

You can view commands with help

Group name: ddl data definition language Group name: dml data manipulation language Group name: namespace namespaces (that is, the concept of libraries in mysql) Group name: tools tools for operation and maintenance Group name: replication for backup of copies Group name: tools for snapshots snapshots Group name: security for security verification Command status view status version view database version number whoami to see which user operated this database list view which tables to find the coordinate table name of a field value-> rowkey---- > column family name-> column-- > version number (version number may not be specified He defaults to return the latest value)

In the hbase table, only one data type is supported: byte []

The hbase command line client creates tables and inserts data, as well as the sorting characteristics of the tables hbase > create 't1lists, {NAME = > 'f1statistics = > 1}, {NAME = >' f2'}, {NAME = > 'f3'}' t1' means: table names {NAME = > f1'}, {NAME = > f2'} {NAME = > 'f3'} indicates that the column family name VERSIONS indicates that at most several versions can be saved in the F1 column family. Now let's create a table create' user_info', {NAME = > 'base_info',VERSIONS = > 3}. After it is created, you can use list to view it. There is the existence of this table (which can actually be seen on the web page). Use the put command to insert data directly by pressing put to see the examples provided by the system: hbase > put 't1records, 'r1statistics,' c1codes, 'value'. Ts1 't1': table name'R1 'means row key' c1' indicates column name (column name belongs to column family column family and column name is written together) 'value'' means: the value of the column name you insert ts1 means: event rub (usually does not need to be defined manually) such as put 'user_info','rk0001','base_info:id','1' put' user_info','rk0001' 'base_info:name','zhangsan' put' user_info','rk0001','base_info:age','20' put 'user_info','rk0001','extra_info:address','beijing' (insert four fields and values in the first row: id name age in base_info and address in extra_info) put' user_info','rk0002','base_info:id','2' put 'user_info' 'rk0002','base_info:name','lisi' put 'user_info','rk0002','base_info:sex','male'

(two rows of data are inserted)

Next, let's check it out.

Query the data in get in dml to scan the entire table for a row of scan

Scan 'user_info' (the entire table viewed)

Scan can also specify the starting range, which does not need to scan the entire table when the table is very large

Scan 't1cards, {COLUMNS = > ['C1', 'c2'], LIMIT = > 10, STARTROW = >' xyz'}

't1' indicates that the table name COLUMNS = > ['c1columns,' c2'] indicates which columns to return STARTROW = > 'xyz' means: rows at the beginning of the range LIMIT = > 10 how many rows to query

For example: scan 'user_info', {LIMIT = > 2, STARTROW = >' rk0001'} hbase (main): 030STARTROW 0 > scan 'mjh_shiyanshuju:mjh_hbase', {COLUMNS = > [' baked, 'ct'], LIMIT = > 1, STARTROW = >' 001'} ROW COLUMN+CELL 001 column=b:balance Timestamp=1521192923285, value=10000.00 001 column=ct:email, timestamp=1521192923285, value=zhangsan@email.com 001 column=ct:tep, timestamp=1521192923285, value=123456789 1 row (s) in 0.0600 secondshbase how to query the total number of records in the table? Hbase (main): 010vargo:vargo_hbase'3 row 0 > count 'vargo:vargo_hbase'3 row (s) in 0.0300 seconds= > 3

The order of his arrangement is not necessarily the order in which we inserted it. His interior will sort it for us automatically, and it will be stored for us after sorting. Instead of sorting by the order we inserted, he sorts by the name of the field (he sorts by the size of the alphabetical sort).

(in hbase's table, column families and kv are sorted automatically, sorted according to the dictionary order of column family names and column names, and rows are sorted according to the dictionary order of row keys.)

For example, it is arranged in the following order

Rk0001

Rk00010

Rk0002

Use the get command to query data get 't1records, 'r1stores, {COLUMN = >' c1databases, TIMERANGE = > [ts1, ts2], VERSIONS = > 4} COLUMNS = > 'c1' to indicate which columns to return: VERSIONS = > 4 means: returned version number TIMERANGE = > [ts1, ts2]: time rub such as: get' user_info', 'rk0001'get' user_info', 'rk0001', {COLUMN = >' base_info:name',VERSIONS = > 3}.

Hbase command line client manages namespaces

Group name: namespace namespace (that is, the concept of libraries in mysql)

We can create different tables in different namespaces

Create_namespace 'orderdb'

Create 'orderdb:t_order','f1','f2'

View namespace list_namespace delete namespace disable 'orderdb:t_order' (disable tables in' orderdb:t_order') drop 'orderdb:t_order' (delete tables in' orderdb:t_order') drop_namespace 'orderdb' (delete' orderdb' namespace) hbase (main): 015orderdb:t_order' 0 > count 'vg_device'Current count: 1000 Row: 948069935274401803 1586 row (s) in 0.3650 seconds= > 1586hbase (main): 016 vg_device' 0 > truncate 'vg_device' (situation statement) Truncating' vg_device' table (it may take a while):-Disabling table... -Truncating table...0 row (s) in 3.3790 secondshbase (main): 017 vg_device'0 row 0 > count 'vg_device'0 row (s) in 0.2500 seconds= > 0

Lsdf command rewrite appears interface hbase (main) > icon

Summary of basic commands for Hbase table building: https://blog.csdn.net/kky2010_110/article/details/12616137

HBase scan shell operation details: https://blog.csdn.net/vaq37942/article/details/54949428

* * Hbase functions in various directories on HDFS * * https://blog.csdn.net/jsjsjs1789/article/details/527393561, / hbase/.META. This is the storage path for the META table described in Store 1. 2. / hbase/.archiveHBase after the Split or compact operation is completed, the HFile will be moved to the .archive directory, and then the previous hfile will be deleted, and the directory will be cleaned regularly by a scheduled task on HMaster. 3. / hbase/.corrupt stores log files damaged by HBase, which are generally empty. 4. / hbase/.hbckHBase occasionally encounters metadata inconsistencies during operation and maintenance. In this case, the hbck tool provided will be used to fix it, and this directory will be used as a temporary over-buffer during the repair process. 5. / hbase/WAL everyone knows that HBase supports WAL (Write Ahead Log). At the beginning of the first startup, HBase will create a directory for each RegionServer under .log. If the client enables WAL mode, it will first write a copy of the data to .log. When the RegionServer crash or directory reaches a certain size, replay mode will be enabled, similar to MySQL's binlog. 6. / hbase/oldlogs when the HLog in the .logs folder is useless, it will be move to .oldlogs, and HMaster will clean it regularly. 7. / hbase/.snapshothbase if the snapshot function is enabled, and after a snapshot is established for a user table, the snapshot is stored in this directory. If you make a snapshot called sp_test to the table test, a sp_test folder will be created under the / hbase/.snapshot/ directory, and all writes after snapshot will be recorded on this snapshot. 8. / hbase/.tmp when creating or deleting a table, it will move the table to the tmp directory, and then do the processing operation. 9. / hbase/hbase.id it is a file, and the only cluster id number of the storage cluster is a uuid. 10. / hbase/hbase.version is also a file. The version number of the storage cluster appears to be encrypted and cannot be seen. It can only be displayed correctly through web-ui.

Wednesday, 2019-1-16

The implementation Scheme of HBase Secondary Index https://blog.csdn.net/wypersist/article/details/79830811

The limitation of HBase: HBase itself only provides queries based on row keys and full table scans, while the row key index is single, so it is difficult for multi-dimensional queries.

The first-level index of HBase is rowkey, and we can only retrieve it through rowkey. If we do some combinatorial queries relative to the columns of the column families in hbase, we need to use HBase's secondary index scheme for multi-conditional queries.

MapReduce scheme ITHBASE (Indexed-Transanctional HBase) scheme IHBASE (Index HBase) scheme Hbase Coprocessor (coprocessor) scheme

Solr+hbase scheme

CCIndex (complementalclustering index) scheme

Detailed explanation of hbase distributed storage mechanism introduction / / see separate document

Region server can be made up of many based on the size of the data

Regionserver01 regionserver02 regionserver03

How exactly are the tables created by the user stored in hbase?

When several rows of a large table reach a certain range (when the amount of data of several rows reaches 10G, it is divided into a region)

Every region will be given to every regionserver to manage.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.