The fifth part of big data's learning series-detailed explanation of Hive integrated HBase graphics and text 07/02 Update SLTechnology News&Howtos

The fifth part of big data's learning series-detailed explanation of Hive integrated HBase graphics and text

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Introduction

In the previous article on the fourth part of big data's learning series-Hadoop+Hive environment building (stand-alone) and the previous big data learning series second-HBase environment building (stand-alone), the Hive and HBase environments were successfully built and tested. This article is mainly about how to integrate Hive and HBase.

Communication intention between Hive and HBase

The implementation of the integration of Hive and HBase is accomplished by using their own external API interface to communicate with each other, and the specific work is realized by the hive-hbase-handler-*.jar tool class in the lib directory of Hive. The communication principle is shown in the following figure.

Usage scenarios after Hive integrates HBase:

(1) load the data into HBase through Hive, and the data source can be a file or a table in Hive.

(2) through integration, HBase can support JOIN, GROUP and other SQL query syntax.

(3) through integration, we can not only complete the real-time query of HBase data, but also use Hive to query the data in HBase to complete complex data analysis.

1. Environment selection 1, server selection

Local virtual machine

Operating system: linux CentOS 7

Cpu:2 kernel

Memory: 2G

Hard disk: 40g

2, configuration selection

JDK:1.8 (jdk-8u144-linux-x64.tar.gz)

Hadoop:2.8.2 (hadoop-2.8.2.tar.gz)

Hive: 2.1 (apache-hive-2.1.1-bin.tar.gz)

HBase:1.6.2 (hbase-1.2.6-bin.tar.gz)

3, download address

Official website address

JDK:

Http://www.oracle.com/technetwork/java/javase/downloads

Hadopp:

Http://www.apache.org/dyn/closer.cgi/hadoop/common

Hive

Http://mirror.bit.edu.cn/apache/hive/

HBase:

Http://mirror.bit.edu.cn/apache/hbase/

Baidu cloud disk

Link: https://pan.baidu.com/s/1jIemIDC password: uycu

Second, the configuration of the server

You should configure Hadoop+Hive+HBase before you configure it.

For convenience in doing these configurations, use root permissions.

1, change the hostname

Change the hostname first in order to facilitate management.

Enter:

Hostname

View the name of this machine

Then change the hostname to master

Enter:

Hostnamectl set-hostname master

Note: after the host name is changed, reboot will not take effect until it is restarted.

2. Map IP to hostname

Modify the hosts file to do relational mapping

Input

Vim / etc/hosts

Add

Ip and host name of the host

192.168.238.128 master3, turn off the firewall

Turn off the firewall for easy access.

The following inputs for CentOS 7 version:

Turn off the firewall

Service iptables stop

Input for versions above CentOS 7:

Systemctl stop firewalld.service4, time settin

View current time

Enter:

Date

Check whether the server time is the same, and if not, change it.

Change time command

Date-s' MMDDhhmmYYYY.ss'5, the overall environment configuration

/ overall configuration of etc/profile

# Java Configexport JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib# Scala Configexport SCALA_HOME=/opt/scala/scala-2.12.2# Spark Configexport SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive# Zookeeper Configexport ZK_HOME=/opt/zookeeper/zookeeper3.4# HBase Configexport HBASE_HOME=/opt/hbase/hbase1. Hadoop Configexport HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib "# Hive Configexport HIVE_HOME=/opt/hive/hive2.1export HIVE_CONF_DIR=$ {HIVE_HOME} / confexport PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin: ${ZK_HOME} / bin:$ {HBASE_HOME} / bin:$ {HIVE_HOME} / bin:$PATH

Note: the specific configuration is based on your own, and you don't need to configure what you don't have.

III. Environment configuration of Hadoop

The specific configuration of Hadoop is described in detail in Hadoop Environment Building (stand-alone), one of big data's learning series. So this article will give a general introduction.

Note: the specific configuration is based on your own.

1, environment variable setting

Edit the / etc/profile file:

Vim / etc/profile

Configuration file:

Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" export PATH=.:$ {JAVA_HOME} / bin:$ {HADOOP_HOME} / bin:$PATH2, configuration file change

Change to the / home/hadoop/hadoop2.8/etc/hadoop/ directory first

3.2.1 modify core-site.xml

Enter:

Vim core-site.xml

Before adding:

Hadoop.tmp.dir / root/hadoop/tmp Abase for other temporary directories. Fs.default.name hdfs://master:9000 3.2.2 modify hadoop-env.sh

Enter:

Vim hadoop-env.sh

Modify ${JAVA_HOME} to your own JDK path

Export JAVA_HOME=$ {JAVA_HOME}

Modified to:

Export JAVA_HOME=/home/java/jdk1.83.2.3 modifies hdfs-site.xml

Enter:

Vim hdfs-site.xml

Before adding:

Dfs.name.dir / root/hadoop/dfs/name Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently. Dfs.data.dir / root/hadoop/dfs/data Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks. Dfs.replication 2 dfs.permissions false need not permissions3.2.4 modify mapred-site.xml

If the file is not mapred-site.xml, copy the mapred-site.xml.template file and rename it to mapred-site.xml.

Enter:

Vim mapred-site.xml

Modify the newly created mapred-site.xml file and add the configuration to the node:

Mapred.job.tracker master:9001 mapred.local.dir / root/hadoop/var mapreduce.framework.name yarn3,Hadoop startup

You need to format before starting.

Change to / home/hadoop/hadoop2.8/bin directory

Enter:

. / hadoop namenode-format

After the format is successful, switch to the / home/hadoop/hadoop2.8/sbin directory

Start hdfs and yarn

Enter:

Start-dfs.shstart-yarn.sh

After the startup is successful, enter jsp to check whether the startup is successful

Enter ip+8088 and ip+ 50070 in the browser to see if you can access it.

If you can access it correctly, you can start it successfully.

IV. Environment configuration of Hive

The specific configuration of the Hive environment is described in detail in my article on the fourth of big data's learning series-Hadoop+Hive environment building graphics and text (stand-alone). This article will give you a general introduction.

Modify hive-site.xml

Change to / opt/hive/hive2.1/conf directory

Make a copy of hive-default.xml.template and rename it hive-site.xml

Then edit the hive-site.xml file

Cp hive-default.xml.template hive-site.xmlvim hive-site.xml

Edit the hive-site.xml file and add:

Hive.metastore.warehouse.dir / root/hive/warehouse hive.exec.scratchdir / root/hive hive.metastore.uris javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root Javax.jdo.option.ConnectionPassword 123456 hive.metastore.schema.verification false

And then set all the

${system:java.io.tmpdir}

Change to / opt/hive/tmp (create if you don't have the file)

And give this folder read and write permissions, and set the

${system:user.name}

Change to root

For example:

Change the previous:

After the change:

Configuration diagram:

Note: because there are too many configurations in the hive-site.xml file, you can download it through FTP for editing. You can also configure what you want directly, and the rest can be deleted. The master in the connection address of the MySQL is the alias of the host and can be replaced with ip.

Modify hive-env.sh

Modify the hive-env.sh file, copy hive-env.sh.template without it, and rename it to hive-env.sh

Add to this configuration file

Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HIVE_CONF_DIR=/opt/hive/hive2.1/confexport HIVE_AUX_JARS_PATH=/opt/hive/hive2.1/lib add data driven package

Because the default database that comes with Hive uses mysql, so this block uses mysql

Upload the driver package of mysql to / opt/hive/hive2.1/lib

V. Environmental configuration of HBase

The specific configuration of the HBase environment is described in detail in the second part of my big data learning series-HBase environment building (stand-alone). This article will give you a general introduction.

Modify hbase-env.sh

Edit the hbase-env.sh file to add the following configuration

Export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8export HBASE_HOME=/opt/hbase/hbase1.2export HBASE_CLASSPATH=/opt/hadoop/hadoop2.8/etc/hadoopexport HBASE_PID_DIR=/root/hbase/pidsexport HBASE_MANAGES_ZK=false

Description: configure the path to your own. HBASE_MANAGES_ZK=false is a Zookeeper cluster that does not enable HBase native.

Modify hbase-site.xml

Edit the hbase-site.xml file and add the following configuration

Hbase.rootdir hdfs://test1:9000/hbase The directory shared byregion servers. Hbase.zookeeper.property.clientPort 2181 Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect. Zookeeper.session.timeout 120000 hbase.zookeeper.quorum test1 hbase.tmp.dir / root/hbase/tmp hbase.cluster.distributed false

Description: hbase.rootdir: this directory is a shared directory for region server and is used to persist Hbase. Hbase.cluster.distributed: the running mode of Hbase. False is in stand-alone mode and true is in distributed mode. If false,Hbase and Zookeeper run in the same JVM.

6. Hive integrates HBase environment configuration and test 1, environment configuration

Because the implementation of the integration of Hive and HBase is accomplished by using their own external API interface to communicate with each other, the specific work is realized by the hive-hbase-handler-.jar tool class in the lib directory of Hive. So you just need to copy the hive-hbase-handler-.jar of hive into hbase/lib.

Change to the hive/lib directory

Enter:

Cp hive-hbase-handler-*.jar / opt/hbase/hbase1.2/lib

Note: if there is a version problem in the hive integration hbase, then focus on the version of hbase and overwrite the jar package in hbase over the jar package in hive.

2Perfect hive and hbase test

When testing, make sure that the hadoop, hbase, and hive environments have been successfully built and started successfully.

Open two command windows of xshell

One enters hive, the other enters hbase

6.2.1 create a table that maps hbase in hive

Create a table that maps hbase in hive. For convenience, set the name of the table on both sides to t_student, and store the same table.

In hive, enter:

Create table t_student (id int,name string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ": key,st1:name") tblproperties ("hbase.table.name" = "t_student", "hbase.mapred.output.outputtable" = "t_student")

Description: the first t_student is the name of the hive table, the second t_student is the name of the table defined in the hbase, and the third t_student is the name of the stored data table ("hbase.mapred.output.outputtable" = "t_student" this can be left out, the table data is stored in the second table).

(id int,name string) this is the hive table structure. If you want to add fields, add them in this format. If you want to add comments to the field, add comment "what you want to describe" at the end of the field.

For example:

Create table t_student (id int comment 'StudentId',name string comment' StudentName')

Org.apache.hadoop.hive.hbase.HBaseStorageHandler this is the designated memory.

Hbase.columns.mapping is a column family defined in hbase.

For example, st1 is the column family, and name is the column. Create the table t_student in hive, which contains two fields (int type id and string type name). Mapping to the table t _ student,key in hbase corresponds to the rowkey,value of hbase corresponds to the st1:name column of hbase.

After the table is successfully created

View the table and table structure in hive and hbase respectively

Input in hive

Show tables;describe t_student

Hbase input:

Listdescribe 'tasking student'

You can see that the table has been created successfully

6.2.2 data synchronization testing

After entering hbase

Add two pieces of data to t_student and then query the table

Put 's1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1.

Then switch to hive

Query the table

Enter:

Select * from t_student

Then delete the table in hive

Note: because the test depends on the results, the table is deleted. If the students want to do the test, it is not necessary to delete the table, because the table will be used later.

Then check to see if the tables in hive and hbase have been deleted

Enter:

Drop table t_student

Through these, you can see that the data between hive and hbase is synchronized successfully!

6.2.3 Associated query Test hive external Table Test

First create a t_student_info table in hbase and add two column families

Then look at the table structure

Enter:

Create 'tincture studentinfores,' tincture study infores'

Then create an external table in hive

Description: use the EXTERNAL keyword to create an external table

Enter:

Create external table t_student_info (id int,age int,sex string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ": key,st1:age,st2:sex") tblproperties ("hbase.table.name" = "t_student_info")

Then add the data to the t_student_info

Put'trecorder studentinfoographies, the1001memores, the st2freedsexuals, the manners put, the students, the women, the theors1001, the theorists, the students, the women.

Then query the table in hive

Enter:

Select * from t_student_info

After the data is queried, t_student and t_student_info are then associated with the query.

Enter:

Select * from t_student t join t_student ti where t.id=ti.id

Description: through the association query, it can be concluded that the table can be associated with the query. But it's obvious how slow it is for hive to use the default mapreduce as its engine.

Other instructions:

Due to the poor configuration of my virtual machine, even if I increase the reduce memory and limit the amount of data processed by each reduce, I still can't use the company's testing service to test.

When querying a table, hive does not use the engine, so it is relatively fast. If it is associated with queries and so on, it will use the engine. Because the default engine of hive is mr, it will be very slow and has something to do with configuration. Hive2.x will not recommend using mr in the future.

This is the end of this article, thank you for reading!

Author: nothingness

Source of blog Park: http://www.cnblogs.com/xuwujing

Source of CSDN: http://blog.csdn.net/qazwsxpcm

Source of personal blog: http://www.panchengming.com

Original is not easy, reprint please indicate the source, thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.