In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
# HIVE can be built on any node. Experiment is done in master.
Link: http://pan.baidu.com/s/1i4LCmAp password: 302x hadoop+hive download
# # if you copy it intact, you will surely die. Please enter the relevant parameters and path according to the actual situation.
1. Infrastructure of Hive
A. Based on the built hadoop
B. Download the hive version and note that it corresponds to hadoop.
2. Install hive
A. Extract the downloaded package to / usr/local/ and name it hive.
Tar-zxvf apache-hive-1.2.1-bin.tar.gz-C / usr/local
Cd / usr/local
Mvapache-hive-1.2.1-bin hive
B. Set the environment variable
Vim/etc/profile
ExportHIVE_HOME=/usr/local/hive
ExportHIVE_CONF_DIR=/usr/local/conf
ExportPATH=$PATH:$HIVE_HOME/bin
ExportHIVE_LIB=$HIVE_HOME/lib
At this point, you can start hive
[root@mycat ~] # hive
Logging initialized using configuration injarvus fileVlue USAR Lexus localUnix HiveUnix Lib Unix hiveMercommonMe 1.2.1.jarlux Log 4j.properties
Hive > show databases
OK
Default
Time taken: 1.096 seconds, Fetched: 1row (s)
By default, Hive metadata is stored in an embedded Derby database, allowing only one session connection, which is only suitable for simple tests. In order to support multi-user and multi-session, we need a separate Metabase. We use MySQL as the Metabase, and Hive provides good support for MySQL internally.
Second, use MySQL to store metadata
1. Start the mysql5.6 version (the operation procedure is brief)
2. Configure the hive file. The configuration file has a template file of .template in the / usr/local/hive/conf/ directory, which is copied as hive-env.sh.
[root@mycat conf] # cp hive-env.sh.templatehive-env.sh
[root@mycat conf] # vim hive-env.sh
# # the configuration and / etc/profile are duplicated here. You can configure environment variables without configuration.
1. Environmental variables
ExportHADOOP_HEAPSIZE=1024
HADOOP_HOME=/usr/local/hadoop
Export HIVE_CONF_DIR=/usr/local/hive/conf
Export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
2. Copy a file that copies a hive-site.xml from the template file
Cp / usr/local/hive/conf/hive-default.xml.template. / hive-site.xml
Find value through the value in the name tag, modify the value, and download the configuration file:
Http://down.51cto.com/data/2260702
# # if you copy it intact, you will surely die. Please fill in the relevant parameters and paths according to the actual conditions, and modify your actual values if you mark them in red.
Hive.metastore.warehouse.dir
/ home/hive/warehouse
Javax.jdo.option.ConnectionURL
Jdbc:mysql://192.168.1.108:3306/hive?characterEncoding=UTF-8
# need to add a library to store metadata in the database
Javax.jdo.option.ConnectionUserName
Hive
Javax.jdo.option.ConnectionPassword
Mysql
# # user name and password for connecting to the database, authorized user name and password
Hive.hwi.listen.port
9999
Hive.exec.local.scratchdir
/ home/hive
Hive.downloaded.resources.dir
/ home/hive/tmp
Hive.querylog.location
/ home/hive
Configure log information for output
Hive.log.dir=/home/hive
Hive.log.file=hive.log
Log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
3. Create a directory where HIVE stores data according to configuration
Mkdir / home/hive/tmp-p
4. Configure jdbc connector
1. Download the package and decompress it
2. Copy mysql-connector-java-5.1.6-bin.jar to hive under lib
If you find a prompt when you delete a table:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException
You need to replace the version of the mysql-connector-java package
Link: http://pan.baidu.com/s/1qXIGeSG password: iykt download package
Cp mysql-connector-java-5.1.6-bin.jar / usr/local/hive/lib/
Page prompt
Http://192.168.1.114:50070/explorer.html#/home hints that this needs to be modified.
Permission denied: user=dr.who, access=READ_EXECUTE,inode= "/ home": root:supergroup:drwx-wx-wx
Add the following to vim / usr/local/hadoop/etc/hadoop/hdfs-site.xml, and then restart
Dfs.permissions.enabled
False
# after configuring these, restart the namenode node to take effect
The test hive operation can be displayed in mysql and on the HADOOP page
Log in to hive
5.2 create an internal table
Internal table characteristics: if the data is loaded into the internal table, if the data is local, a copy of the local data will be copied to the directory specified by the internal LOCATION, and if the data is on the hdfs, the data in the hdfs will be mv to the LOCATION specified by the internal table. When you delete an internal table, the data under the corresponding LOCATION is deleted.
Create table neibu_table (id int)
Show tables # to view
The default location of hive in hdfs is / user/hive/warehouse, which can be modified and is determined by the attribute hive.metastore.warehouse.dir in the configuration file hive-site.xml, and the student directory is created under / home/hive/warehouse/testdb.db.
You can view it through the browser
Let's start with the difference between internal tables and external tables in Hive: (key points)
1) when creating a table: when creating an internal table, the data will be moved to the path pointed to by the data warehouse; if an external table is created, only the path where the data is located will be recorded and no change will be made to the location of the data.
2) when deleting a table: when deleting a table, the metadata and data of the internal table will be deleted together, while the external table will only delete the metadata, not the data. In this way, external tables are relatively more secure, data organization is more flexible, and it is convenient to share source data.
In addition, it should be noted that the traditional database validates the table data by schema on write (write-time mode), while Hive does not check whether the data conforms to schema when load. Hive follows schema on read (read-time mode), and hive checks and parses specific data fields and schema only when reading.
The advantage of the read-time mode is that load data is very fast because it doesn't need to read the data for parsing, just copying or moving files.
The advantage of the write-time mode is that it improves query performance because columns can be indexed and compressed after pre-parsing, but it also takes more time to load.
1. Internal table operation:
Load data into the table method:
# # Note that the data of the aa file should be created first
Hive > LOAD DATA LOCAL INPATH'/ home/aa'INTO TABLE neibu_table
Loading data to table default.neibu_table
Table default.neibu_table stats: [numFiles=1, totalSize=10]
OK
Time taken: 0.639 seconds
Select * from neibu_table
When using select* without conditions, MapReduce is not executed, and execution is faster; the last line shows null because there is a line of space in the file
Note: the internal table will copy the data to the table directory. If you delete the internal table metadata, the data below the metadata will also be deleted.
The second method of loading data into student
Note the path to the bb.txt file and write a list of numbers
Execute the command hadoop fs-put bb.txt / home/hive/warehouse/neibu_table or hdfsdfs-put t / home/hive/warehouse/neibu_table
Add a library: create database hive20161120
2. Partition table: the log files of each hour or every day are partitioned and stored. Business analysis can be done for a specific time period without having to analyze and scan all the data.
Create table a20161120, in the case of multiple columns, specify the delimiter\ t
Create a partition table:
CREATE TABLE fenqu (id int) PARTITIONED BY (d int)
LOAD DATA LOCAL INPATH 'bb.txt INTO TABLE fenquPARTITION (dumb1)
LOAD DATA LOCAL INPATH 'bb2.txt' INTO TABLE fenquPARTITION (dumb2)
3. Barrel watch
(used in table connection, modular operation is performed according to the number of buckets, and different data are put into different buckets)
Create a table of bucket type
Create table student4 (id int) clustered by (id) into 4 buckets
Bucket table must be enabled
Set hive.enforce.bucketing = true
To insert data, instead of using load, the insert,insert is used to load the data using mapreduce.
Insert into table student4 select id fromstudent3
The table added in hive can view its metadata information in mysql and record all related records under the library defined by the configuration question.
Eg:select * from TBLS
4. External table:
Importing data is the same as internal tables, and there is no repetition here
Hive > drop table outer_table
OK
Time taken: 0.081 seconds
Hive > show tables
OK
Time taken: 0.023 seconds
Hive > create external table outer_table (id int)
>
OK
Time taken: 0.037 seconds
Hive > select * from outer_table
>
OK
one
two
three
four
five
six
seven
eight
nine
ten
Time taken: 0.044 seconds, Fetched: 10 row (s)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.