Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed introduction, installation and deployment of hive

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Introduction:

What is hive???

1Philehive is a data warehouse tool based on Hadoop,

2, you can map the structured data file to a database table, and provide sql-like query function,

3, you can convert sql statements into mapreduce tasks to run,

4, which can be used for data extraction transformation loading (ETL)

5Jing hive is the sql parsing engine, which converts sql statements into MCMUR job and then runs in Hadoop.

The table of hive is actually the directory / folder of HDFS.

The data in the hive table is the file in the hdfs directory. Separate the folders by table name. If it is a partition table, the partition value is a subfolder and the data can be used directly in M _ job.

6Pros and cons of phonehive:

Can provide SQL-like statements to quickly implement simple mapreduce statistics, without the need to develop special mapreduce applications

Real-time query is not supported

7Phantom hive data is divided into real stored data and metadata.

Real data is stored in hdfs and metadata is stored in mysql

Metastore metadata storage database

Hive stores metadata in databases such as MySQL and derby.

The metadata in Hive includes the name of the table, the columns and partitions of the table and its properties, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.

Second, the architecture of hive:

User interface, including CLI (shell), JDBC/ODBC,WebUI (via browser)

Metadata storage, usually stored in relational databases such as mysql, derby

Interpreters, compilers, optimizers, and executors complete HQL query statements from parsing, compilation, optimization and generation of query plans, which are stored in HDFS and then called and executed by mapreduce

Hadoop: HDFS is used for storage and MapReduce is used for calculation (query select * from teacher with * does not generate mapreduce tasks, only full table scans)

I would like to emphasize here:

Hadoop,zookpeer,spark,kafka,mysql has been started normally

Start the installation and deployment of hive

Basic dependent environment:

1meme JDK 1.6q2, hadoop 2.x3pr hive 0.13-0.194 meme MySQL (mysql-connector-jar)

The installation details are as follows:

# java export JAVA_HOME=/soft/jdk1.7.0_79/export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar#binexport PATH=$PATH:/$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:/usr/local/hadoop/hive/bin#hadoopexport HADOOP_HOME=/usr/local/hadoop/hadoop#scalaexport SCALA_HOME=/usr/local/hadoop/scala#sparkexport SPARK_HOME=/usr/local/hadoop/spark#hiveexport HIVE_HOME=/usr/local/hadoop/hive

First, start the installation:

1. Download:

Https://hive.apache.org/downloads.html

Decompress:

Tar xvf apache-hive-2.1.0-bin.tar.gz-C / usr/local/hadoop/cd / usr/local/hadoop/mv apache-hive-2.1.0 hive

2. Modify the configuration

Modify startup environment cd / usr/local/hadoop/hivevim bin/hive-config.sh#java export JAVA_HOME=/soft/jdk1.7.0_79/#hadoopexport HADOOP_HOME=/usr/local/hadoop/hadoop#hiveexport HIVE_HOME=/usr/local/hadoop/hive

Modify the default profile

Cd / usr/local/hadoop/hivevim conf/hive-site.xml javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hive?createDatabaseInfoNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName Hive Username to use against metastore database javax.jdo.option.ConnectionPassword xujun password to use against metastore database

3. Modify tmp dir

Modify to change the value of the configuration item containing "system:java.io.tmpdir" to the address above

/ tmp/hive

4, install mysql driver

Go to the mysql official website to download the driver mysql-connector-java-5.1.40.zip

Unzip mysql-connector-java-5.1.40.zip

Cp mysql-connector-java-5.1.40-bin.jar / user/lcoal/hadoop/hive/lib/

Install mysql and start it

1. Create a database

Create database hive grant all on *. * to hive@'%' identified by 'hive';flush privileges

Third, initialize hive (initialize metadata)

Cd / usr/local/hadoop/hivebin/schematool-initSchema-dbType mysql SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://hadoop3:3306/hive?createDatabaseInfoNotExist=trueMetastore Connection Driver: com.mysql.jdbc.DriverMetastore connection User: hiveStarting metastore schema initialization to 2.1.0Initialization script hive-schema-2.1.0.mysql.sqlInitialization script completedschemaTool completed

IV. Start

[hadoop@hadoop1 hadoop] $hive/bin/hivewhich: no hbase in (/ usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin://soft/jdk1.7.0_79//bin:/bin:/bin:/bin:/usr/local/hadoop/hive/bin:/home/hadoop/bin) SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file : / Found binding in http://www.slf4j.org/codes.html#multiple_bindings Hadoop SLF4J SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.Jar http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J slf4jimplexStaticLoggerBinder.class] SLF4J: .apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in trueHive-on-MR is deprecated in Hive and may not be available in the future versions and may not be available in the future versions. Consider using a different execution engine (i.e. Tez, spark) or using Hive 1.x releases.hive > show databases;OKdefaultTime taken: 1.184 seconds, Fetched: 1 row (s) hive > 5. The following two operations only display the current database name hive (default) > 2jinhive (default) > set hive.cli.print.header=true for the current session terminal 1Magi hive > set hive.cli.print.current.db=true; setting. When querying data using select, the result will be displayed with the field name of the table 3, create the table, and import the data hive > create table teacherq (id bigint,name string) row format delimited fields terminated by'\ tkeeper OKhive > create table people (id int,name string); OKTime taken: 3.363 secondshive > SHOW TABLES;OKpeopleteacherqstudentTime taken: 0.283 seconds, Fetched: 1 row (s) Import data: hive > load data local inpath'/ root/stdent.txt' into table teacherq Note: if you are an ordinary user to start hive, use the relative path to import local data mv stdent.txt / usr/local/hadoop/hive/cd / usr/local/hadoop/hive > load data local inpath 'stdent.txt' into table teacherq;Loading data to table default.teacherqOKTime taken: 2.631 seconds hive > select * from teacherq;OK1 zhangsan2 lisi3 wangwu4 libaiTime taken: 1.219 seconds, Fetched: 4 row (s) hive >

4. Create a table (internal table by default)

Suitable for creating tables first, then load loading data,

Create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by't'

Default normal table load data:

Load data local inpath'/ root/student.txt' into table student

Build an external table

Suitable for hdfs to have data first, then create tables, query data, analyze and manage

Create external table td_ext (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by'\ t 'location' / td_ext'

External table load data:

Load data local inpath'/ root/student.txt' into table student

Build partition table

Method 1: first create a partition table, and then load the data

Partition is to assist the query, narrow the scope of the query, speed up the retrieval of data and manage the data according to certain specifications and conditions.

Create table td_part (id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by'\ t'

Load data in partition table

Load data local inpath'/ root/data.am' into table beauty partition (nation= "USA")

Hive (itcast) > select * from beat

OK

Beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

Time taken: 0.22 seconds, Fetched: 4 row (s)

Method 2: first create a directory in hdfs, pour in the data, and finally, change the information of the hive metadata

1, create a partition directory

Hive (itcast) > dfs-mkdir / beat/nation=japan

Dfs-ls / beat

Found 2 items

Drwxr-xr-x-hadoop supergroup 0 2016-12-05 16:07 / beat/nation=china

Drwxr-xr-x-hadoop supergroup 0 2016-12-05 16:16 / beat/nation=japan

2, load data for the partition directory

Hive (itcast) > dfs-put d.c / beat/nation=japan

Query the data at this time: the data has not been loaded yet.

Hive (itcast) > dfs-ls / beat/nation=japan

Found 1 items

-rw-r--r-- 3 hadoop supergroup 20 2016-12-05 16:16 / beat/nation=japan/d.c

Hive (itcast) > select * from beat

OK

Beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

Time taken: 0.198 seconds, Fetched: 4 row (s)

3. Manually modify the hive table structure and add partition table information

Hive (itcast) > alter table beat add partition (nation='japan') location "/ beat/nation=japan"

OK

Time taken: 0.089 seconds

Hive (itcast) > select * from beat

OK

Beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

7ab111.0japan

8rb23234.0japan

Time taken: 0.228 seconds, Fetched: 6 row (s)

At this point, the data is loaded.

Delete partition

Users can use ALTER TABLE DROP PARTITION to delete partitions. The metadata and data of the partition will be deleted together.

Example:

ALTER TABLE beat DROP PARTITION (nation='japan')

Special case:

1. A field in the table needs to be the partition name of the partition. Creation is not allowed by default. Solution:

Hive (itcast) > create table sms (id bigint, content string,area string) partitioned by (area string) row format delimited fields terminated by'\ t'

FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

Solution:

Create redundant fields, even if you use area_pat to distinguish

Or modify the source code

Hive (itcast) > create table sms (id bigint, content string,area string) partitioned by (area_pat string) row format delimited fields terminated by'\ t'

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report