In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Introduction
In the previous article on the fourth part of big data's learning series-Hadoop+Hive environment building (stand-alone) and the previous big data learning series second-HBase environment building (stand-alone), the Hive and HBase environments were successfully built and tested. This article is mainly about how to integrate Hive and HBase.
Communication intention between Hive and HBase
The implementation of the integration of Hive and HBase is accomplished by using their own external API interface to communicate with each other, and the specific work is realized by the hive-hbase-handler-*.jar tool class in the lib directory of Hive. The communication principle is shown in the following figure.
Usage scenarios after Hive integrates HBase:
(1) load the data into HBase through Hive, and the data source can be a file or a table in Hive.
(2) through integration, HBase can support JOIN, GROUP and other SQL query syntax.
(3) through integration, we can not only complete the real-time query of HBase data, but also use Hive to query the data in HBase to complete complex data analysis.
1. Environment selection 1, server selection
Local virtual machine
Operating system: linux CentOS 7
Cpu:2 kernel
Memory: 2G
Hard disk: 40g
2, configuration selection
JDK:1.8 (jdk-8u144-linux-x64.tar.gz)
Hadoop:2.8.2 (hadoop-2.8.2.tar.gz)
Hive: 2.1 (apache-hive-2.1.1-bin.tar.gz)
HBase:1.6.2 (hbase-1.2.6-bin.tar.gz)
3, download address
Official website address
JDK:
Http://www.oracle.com/technetwork/java/javase/downloads
Hadopp:
Http://www.apache.org/dyn/closer.cgi/hadoop/common
Hive
Http://mirror.bit.edu.cn/apache/hive/
HBase:
Http://mirror.bit.edu.cn/apache/hbase/
Baidu cloud disk
Link: https://pan.baidu.com/s/1jIemIDC password: uycu
Second, the configuration of the server
You should configure Hadoop+Hive+HBase before you configure it.
For convenience in doing these configurations, use root permissions.
1, change the hostname
Change the hostname first in order to facilitate management.
Enter:
Hostname
View the name of this machine
Then change the hostname to master
Enter:
Hostnamectl set-hostname master
Note: after the host name is changed, reboot will not take effect until it is restarted.
2. Map IP to hostname
Modify the hosts file to do relational mapping
Input
Vim / etc/hosts
Add
Ip and host name of the host
192.168.238.128 master3, turn off the firewall
Turn off the firewall for easy access.
The following inputs for CentOS 7 version:
Turn off the firewall
Service iptables stop
Input for versions above CentOS 7:
Systemctl stop firewalld.service4, time settin
View current time
Enter:
Date
Check whether the server time is the same, and if not, change it.
Change time command
Date-s' MMDDhhmmYYYY.ss'5, the overall environment configuration
/ overall configuration of etc/profile
# Java Configexport JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib# Scala Configexport SCALA_HOME=/opt/scala/scala-2.12.2# Spark Configexport SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive# Zookeeper Configexport ZK_HOME=/opt/zookeeper/zookeeper3.4# HBase Configexport HBASE_HOME=/opt/hbase/hbase1. Hadoop Configexport HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib "# Hive Configexport HIVE_HOME=/opt/hive/hive2.1export HIVE_CONF_DIR=$ {HIVE_HOME} / confexport PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin: ${ZK_HOME} / bin:$ {HBASE_HOME} / bin:$ {HIVE_HOME} / bin:$PATH
Note: the specific configuration is based on your own, and you don't need to configure what you don't have.
III. Environment configuration of Hadoop
The specific configuration of Hadoop is described in detail in Hadoop Environment Building (stand-alone), one of big data's learning series. So this article will give a general introduction.
Note: the specific configuration is based on your own.
1, environment variable setting
Edit the / etc/profile file:
Vim / etc/profile
Configuration file:
Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" export PATH=.:$ {JAVA_HOME} / bin:$ {HADOOP_HOME} / bin:$PATH2, configuration file change
Change to the / home/hadoop/hadoop2.8/etc/hadoop/ directory first
3.2.1 modify core-site.xml
Enter:
Vim core-site.xml
Before adding:
Hadoop.tmp.dir / root/hadoop/tmp Abase for other temporary directories. Fs.default.name hdfs://master:9000 3.2.2 modify hadoop-env.sh
Enter:
Vim hadoop-env.sh
Modify ${JAVA_HOME} to your own JDK path
Export JAVA_HOME=$ {JAVA_HOME}
Modified to:
Export JAVA_HOME=/home/java/jdk1.83.2.3 modifies hdfs-site.xml
Enter:
Vim hdfs-site.xml
Before adding:
Dfs.name.dir / root/hadoop/dfs/name Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently. Dfs.data.dir / root/hadoop/dfs/data Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks. Dfs.replication 2 dfs.permissions false need not permissions3.2.4 modify mapred-site.xml
If the file is not mapred-site.xml, copy the mapred-site.xml.template file and rename it to mapred-site.xml.
Enter:
Vim mapred-site.xml
Modify the newly created mapred-site.xml file and add the configuration to the node:
Mapred.job.tracker master:9001 mapred.local.dir / root/hadoop/var mapreduce.framework.name yarn3,Hadoop startup
You need to format before starting.
Change to / home/hadoop/hadoop2.8/bin directory
Enter:
. / hadoop namenode-format
After the format is successful, switch to the / home/hadoop/hadoop2.8/sbin directory
Start hdfs and yarn
Enter:
Start-dfs.shstart-yarn.sh
After the startup is successful, enter jsp to check whether the startup is successful
Enter ip+8088 and ip+ 50070 in the browser to see if you can access it.
If you can access it correctly, you can start it successfully.
IV. Environment configuration of Hive
The specific configuration of the Hive environment is described in detail in my article on the fourth of big data's learning series-Hadoop+Hive environment building graphics and text (stand-alone). This article will give you a general introduction.
Modify hive-site.xml
Change to / opt/hive/hive2.1/conf directory
Make a copy of hive-default.xml.template and rename it hive-site.xml
Then edit the hive-site.xml file
Cp hive-default.xml.template hive-site.xmlvim hive-site.xml
Edit the hive-site.xml file and add:
Hive.metastore.warehouse.dir / root/hive/warehouse hive.exec.scratchdir / root/hive hive.metastore.uris javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root Javax.jdo.option.ConnectionPassword 123456 hive.metastore.schema.verification false
And then set all the
${system:java.io.tmpdir}
Change to / opt/hive/tmp (create if you don't have the file)
And give this folder read and write permissions, and set the
${system:user.name}
Change to root
For example:
Change the previous:
After the change:
Configuration diagram:
Note: because there are too many configurations in the hive-site.xml file, you can download it through FTP for editing. You can also configure what you want directly, and the rest can be deleted. The master in the connection address of the MySQL is the alias of the host and can be replaced with ip.
Modify hive-env.sh
Modify the hive-env.sh file, copy hive-env.sh.template without it, and rename it to hive-env.sh
Add to this configuration file
Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HIVE_CONF_DIR=/opt/hive/hive2.1/confexport HIVE_AUX_JARS_PATH=/opt/hive/hive2.1/lib add data driven package
Because the default database that comes with Hive uses mysql, so this block uses mysql
Upload the driver package of mysql to / opt/hive/hive2.1/lib
V. Environmental configuration of HBase
The specific configuration of the HBase environment is described in detail in the second part of my big data learning series-HBase environment building (stand-alone). This article will give you a general introduction.
Modify hbase-env.sh
Edit the hbase-env.sh file to add the following configuration
Export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8export HBASE_HOME=/opt/hbase/hbase1.2export HBASE_CLASSPATH=/opt/hadoop/hadoop2.8/etc/hadoopexport HBASE_PID_DIR=/root/hbase/pidsexport HBASE_MANAGES_ZK=false
Description: configure the path to your own. HBASE_MANAGES_ZK=false is a Zookeeper cluster that does not enable HBase native.
Modify hbase-site.xml
Edit the hbase-site.xml file and add the following configuration
Hbase.rootdir hdfs://test1:9000/hbase The directory shared byregion servers. Hbase.zookeeper.property.clientPort 2181 Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect. Zookeeper.session.timeout 120000 hbase.zookeeper.quorum test1 hbase.tmp.dir / root/hbase/tmp hbase.cluster.distributed false
Description: hbase.rootdir: this directory is a shared directory for region server and is used to persist Hbase. Hbase.cluster.distributed: the running mode of Hbase. False is in stand-alone mode and true is in distributed mode. If false,Hbase and Zookeeper run in the same JVM.
6. Hive integrates HBase environment configuration and test 1, environment configuration
Because the implementation of the integration of Hive and HBase is accomplished by using their own external API interface to communicate with each other, the specific work is realized by the hive-hbase-handler-.jar tool class in the lib directory of Hive. So you just need to copy the hive-hbase-handler-.jar of hive into hbase/lib.
Change to the hive/lib directory
Enter:
Cp hive-hbase-handler-*.jar / opt/hbase/hbase1.2/lib
Note: if there is a version problem in the hive integration hbase, then focus on the version of hbase and overwrite the jar package in hbase over the jar package in hive.
2Perfect hive and hbase test
When testing, make sure that the hadoop, hbase, and hive environments have been successfully built and started successfully.
Open two command windows of xshell
One enters hive, the other enters hbase
6.2.1 create a table that maps hbase in hive
Create a table that maps hbase in hive. For convenience, set the name of the table on both sides to t_student, and store the same table.
In hive, enter:
Create table t_student (id int,name string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ": key,st1:name") tblproperties ("hbase.table.name" = "t_student", "hbase.mapred.output.outputtable" = "t_student")
Description: the first t_student is the name of the hive table, the second t_student is the name of the table defined in the hbase, and the third t_student is the name of the stored data table ("hbase.mapred.output.outputtable" = "t_student" this can be left out, the table data is stored in the second table).
(id int,name string) this is the hive table structure. If you want to add fields, add them in this format. If you want to add comments to the field, add comment "what you want to describe" at the end of the field.
For example:
Create table t_student (id int comment 'StudentId',name string comment' StudentName')
Org.apache.hadoop.hive.hbase.HBaseStorageHandler this is the designated memory.
Hbase.columns.mapping is a column family defined in hbase.
For example, st1 is the column family, and name is the column. Create the table t_student in hive, which contains two fields (int type id and string type name). Mapping to the table t _ student,key in hbase corresponds to the rowkey,value of hbase corresponds to the st1:name column of hbase.
After the table is successfully created
View the table and table structure in hive and hbase respectively
Input in hive
Show tables;describe t_student
Hbase input:
Listdescribe 'tasking student'
You can see that the table has been created successfully
6.2.2 data synchronization testing
After entering hbase
Add two pieces of data to t_student and then query the table
Put 's1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1: 1.
Then switch to hive
Query the table
Enter:
Select * from t_student
Then delete the table in hive
Note: because the test depends on the results, the table is deleted. If the students want to do the test, it is not necessary to delete the table, because the table will be used later.
Then check to see if the tables in hive and hbase have been deleted
Enter:
Drop table t_student
Through these, you can see that the data between hive and hbase is synchronized successfully!
6.2.3 Associated query Test hive external Table Test
First create a t_student_info table in hbase and add two column families
Then look at the table structure
Enter:
Create 'tincture studentinfores,' tincture study infores'
Then create an external table in hive
Description: use the EXTERNAL keyword to create an external table
Enter:
Create external table t_student_info (id int,age int,sex string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ": key,st1:age,st2:sex") tblproperties ("hbase.table.name" = "t_student_info")
Then add the data to the t_student_info
Put'trecorder studentinfoographies, the1001memores, the st2freedsexuals, the manners put, the students, the women, the theors1001, the theorists, the students, the women.
Then query the table in hive
Enter:
Select * from t_student_info
After the data is queried, t_student and t_student_info are then associated with the query.
Enter:
Select * from t_student t join t_student ti where t.id=ti.id
Description: through the association query, it can be concluded that the table can be associated with the query. But it's obvious how slow it is for hive to use the default mapreduce as its engine.
Other instructions:
Due to the poor configuration of my virtual machine, even if I increase the reduce memory and limit the amount of data processed by each reduce, I still can't use the company's testing service to test.
When querying a table, hive does not use the engine, so it is relatively fast. If it is associated with queries and so on, it will use the engine. Because the default engine of hive is mr, it will be very slow and has something to do with configuration. Hive2.x will not recommend using mr in the future.
This is the end of this article, thank you for reading!
Copyright notice:
Author: nothingness
Source of blog Park: http://www.cnblogs.com/xuwujing
Source of CSDN: http://blog.csdn.net/qazwsxpcm
Source of personal blog: http://www.panchengming.com
Original is not easy, reprint please indicate the source, thank you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.