In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Introduction:
What is hive???
1Philehive is a data warehouse tool based on Hadoop,
2, you can map the structured data file to a database table, and provide sql-like query function,
3, you can convert sql statements into mapreduce tasks to run,
4, which can be used for data extraction transformation loading (ETL)
5Jing hive is the sql parsing engine, which converts sql statements into MCMUR job and then runs in Hadoop.
The table of hive is actually the directory / folder of HDFS.
The data in the hive table is the file in the hdfs directory. Separate the folders by table name. If it is a partition table, the partition value is a subfolder and the data can be used directly in M _ job.
6Pros and cons of phonehive:
Can provide SQL-like statements to quickly implement simple mapreduce statistics, without the need to develop special mapreduce applications
Real-time query is not supported
7Phantom hive data is divided into real stored data and metadata.
Real data is stored in hdfs and metadata is stored in mysql
Metastore metadata storage database
Hive stores metadata in databases such as MySQL and derby.
The metadata in Hive includes the name of the table, the columns and partitions of the table and its properties, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.
Second, the architecture of hive:
User interface, including CLI (shell), JDBC/ODBC,WebUI (via browser)
Metadata storage, usually stored in relational databases such as mysql, derby
Interpreters, compilers, optimizers, and executors complete HQL query statements from parsing, compilation, optimization and generation of query plans, which are stored in HDFS and then called and executed by mapreduce
Hadoop: HDFS is used for storage and MapReduce is used for calculation (query select * from teacher with * does not generate mapreduce tasks, only full table scans)
I would like to emphasize here:
Hadoop,zookpeer,spark,kafka,mysql has been started normally
Start the installation and deployment of hive
Basic dependent environment:
1meme JDK 1.6q2, hadoop 2.x3pr hive 0.13-0.194 meme MySQL (mysql-connector-jar)
The installation details are as follows:
# java export JAVA_HOME=/soft/jdk1.7.0_79/export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar#binexport PATH=$PATH:/$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:/usr/local/hadoop/hive/bin#hadoopexport HADOOP_HOME=/usr/local/hadoop/hadoop#scalaexport SCALA_HOME=/usr/local/hadoop/scala#sparkexport SPARK_HOME=/usr/local/hadoop/spark#hiveexport HIVE_HOME=/usr/local/hadoop/hive
First, start the installation:
1. Download:
Https://hive.apache.org/downloads.html
Decompress:
Tar xvf apache-hive-2.1.0-bin.tar.gz-C / usr/local/hadoop/cd / usr/local/hadoop/mv apache-hive-2.1.0 hive
2. Modify the configuration
Modify startup environment cd / usr/local/hadoop/hivevim bin/hive-config.sh#java export JAVA_HOME=/soft/jdk1.7.0_79/#hadoopexport HADOOP_HOME=/usr/local/hadoop/hadoop#hiveexport HIVE_HOME=/usr/local/hadoop/hive
Modify the default profile
Cd / usr/local/hadoop/hivevim conf/hive-site.xml javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hive?createDatabaseInfoNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName Hive Username to use against metastore database javax.jdo.option.ConnectionPassword xujun password to use against metastore database
3. Modify tmp dir
Modify to change the value of the configuration item containing "system:java.io.tmpdir" to the address above
/ tmp/hive
4, install mysql driver
Go to the mysql official website to download the driver mysql-connector-java-5.1.40.zip
Unzip mysql-connector-java-5.1.40.zip
Cp mysql-connector-java-5.1.40-bin.jar / user/lcoal/hadoop/hive/lib/
Install mysql and start it
1. Create a database
Create database hive grant all on *. * to hive@'%' identified by 'hive';flush privileges
Third, initialize hive (initialize metadata)
Cd / usr/local/hadoop/hivebin/schematool-initSchema-dbType mysql SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://hadoop3:3306/hive?createDatabaseInfoNotExist=trueMetastore Connection Driver: com.mysql.jdbc.DriverMetastore connection User: hiveStarting metastore schema initialization to 2.1.0Initialization script hive-schema-2.1.0.mysql.sqlInitialization script completedschemaTool completed
IV. Start
[hadoop@hadoop1 hadoop] $hive/bin/hivewhich: no hbase in (/ usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin://soft/jdk1.7.0_79//bin:/bin:/bin:/bin:/usr/local/hadoop/hive/bin:/home/hadoop/bin) SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file : / Found binding in http://www.slf4j.org/codes.html#multiple_bindings Hadoop SLF4J SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.Jar http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J slf4jimplexStaticLoggerBinder.class] SLF4J: .apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in trueHive-on-MR is deprecated in Hive and may not be available in the future versions and may not be available in the future versions. Consider using a different execution engine (i.e. Tez, spark) or using Hive 1.x releases.hive > show databases;OKdefaultTime taken: 1.184 seconds, Fetched: 1 row (s) hive > 5. The following two operations only display the current database name hive (default) > 2jinhive (default) > set hive.cli.print.header=true for the current session terminal 1Magi hive > set hive.cli.print.current.db=true; setting. When querying data using select, the result will be displayed with the field name of the table 3, create the table, and import the data hive > create table teacherq (id bigint,name string) row format delimited fields terminated by'\ tkeeper OKhive > create table people (id int,name string); OKTime taken: 3.363 secondshive > SHOW TABLES;OKpeopleteacherqstudentTime taken: 0.283 seconds, Fetched: 1 row (s) Import data: hive > load data local inpath'/ root/stdent.txt' into table teacherq Note: if you are an ordinary user to start hive, use the relative path to import local data mv stdent.txt / usr/local/hadoop/hive/cd / usr/local/hadoop/hive > load data local inpath 'stdent.txt' into table teacherq;Loading data to table default.teacherqOKTime taken: 2.631 seconds hive > select * from teacherq;OK1 zhangsan2 lisi3 wangwu4 libaiTime taken: 1.219 seconds, Fetched: 4 row (s) hive >
4. Create a table (internal table by default)
Suitable for creating tables first, then load loading data,
Create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by't'
Default normal table load data:
Load data local inpath'/ root/student.txt' into table student
Build an external table
Suitable for hdfs to have data first, then create tables, query data, analyze and manage
Create external table td_ext (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by'\ t 'location' / td_ext'
External table load data:
Load data local inpath'/ root/student.txt' into table student
Build partition table
Method 1: first create a partition table, and then load the data
Partition is to assist the query, narrow the scope of the query, speed up the retrieval of data and manage the data according to certain specifications and conditions.
Create table td_part (id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by'\ t'
Load data in partition table
Load data local inpath'/ root/data.am' into table beauty partition (nation= "USA")
Hive (itcast) > select * from beat
OK
Beat.idbeat.namebeat.sizebeat.nation
1glm22.0china
2slsl21.0china
3sdsd20.0china
NULLwww19.0china
Time taken: 0.22 seconds, Fetched: 4 row (s)
Method 2: first create a directory in hdfs, pour in the data, and finally, change the information of the hive metadata
1, create a partition directory
Hive (itcast) > dfs-mkdir / beat/nation=japan
Dfs-ls / beat
Found 2 items
Drwxr-xr-x-hadoop supergroup 0 2016-12-05 16:07 / beat/nation=china
Drwxr-xr-x-hadoop supergroup 0 2016-12-05 16:16 / beat/nation=japan
2, load data for the partition directory
Hive (itcast) > dfs-put d.c / beat/nation=japan
Query the data at this time: the data has not been loaded yet.
Hive (itcast) > dfs-ls / beat/nation=japan
Found 1 items
-rw-r--r-- 3 hadoop supergroup 20 2016-12-05 16:16 / beat/nation=japan/d.c
Hive (itcast) > select * from beat
OK
Beat.idbeat.namebeat.sizebeat.nation
1glm22.0china
2slsl21.0china
3sdsd20.0china
NULLwww19.0china
Time taken: 0.198 seconds, Fetched: 4 row (s)
3. Manually modify the hive table structure and add partition table information
Hive (itcast) > alter table beat add partition (nation='japan') location "/ beat/nation=japan"
OK
Time taken: 0.089 seconds
Hive (itcast) > select * from beat
OK
Beat.idbeat.namebeat.sizebeat.nation
1glm22.0china
2slsl21.0china
3sdsd20.0china
NULLwww19.0china
7ab111.0japan
8rb23234.0japan
Time taken: 0.228 seconds, Fetched: 6 row (s)
At this point, the data is loaded.
Delete partition
Users can use ALTER TABLE DROP PARTITION to delete partitions. The metadata and data of the partition will be deleted together.
Example:
ALTER TABLE beat DROP PARTITION (nation='japan')
Special case:
1. A field in the table needs to be the partition name of the partition. Creation is not allowed by default. Solution:
Hive (itcast) > create table sms (id bigint, content string,area string) partitioned by (area string) row format delimited fields terminated by'\ t'
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
Solution:
Create redundant fields, even if you use area_pat to distinguish
Or modify the source code
Hive (itcast) > create table sms (id bigint, content string,area string) partitioned by (area_pat string) row format delimited fields terminated by'\ t'
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.