Installation of Hive is combined with MySQL 10/25 Update SLTechnology News&Howtos

Installation of Hive is combined with MySQL

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

# HIVE can be built on any node. Experiment is done in master.

Link: http://pan.baidu.com/s/1i4LCmAp password: 302x hadoop+hive download

# # if you copy it intact, you will surely die. Please enter the relevant parameters and path according to the actual situation.

1. Infrastructure of Hive

A. Based on the built hadoop

B. Download the hive version and note that it corresponds to hadoop.

2. Install hive

A. Extract the downloaded package to / usr/local/ and name it hive.

Tar-zxvf apache-hive-1.2.1-bin.tar.gz-C / usr/local

Cd / usr/local

Mvapache-hive-1.2.1-bin hive

B. Set the environment variable

Vim/etc/profile

ExportHIVE_HOME=/usr/local/hive

ExportHIVE_CONF_DIR=/usr/local/conf

ExportPATH=$PATH:$HIVE_HOME/bin

ExportHIVE_LIB=$HIVE_HOME/lib

At this point, you can start hive

[root@mycat ~] # hive

Logging initialized using configuration injarvus fileVlue USAR Lexus localUnix HiveUnix Lib Unix hiveMercommonMe 1.2.1.jarlux Log 4j.properties

Hive > show databases

Default

Time taken: 1.096 seconds, Fetched: 1row (s)

By default, Hive metadata is stored in an embedded Derby database, allowing only one session connection, which is only suitable for simple tests. In order to support multi-user and multi-session, we need a separate Metabase. We use MySQL as the Metabase, and Hive provides good support for MySQL internally.

Second, use MySQL to store metadata

1. Start the mysql5.6 version (the operation procedure is brief)

2. Configure the hive file. The configuration file has a template file of .template in the / usr/local/hive/conf/ directory, which is copied as hive-env.sh.

[root@mycat conf] # cp hive-env.sh.templatehive-env.sh

[root@mycat conf] # vim hive-env.sh

# # the configuration and / etc/profile are duplicated here. You can configure environment variables without configuration.

1. Environmental variables

ExportHADOOP_HEAPSIZE=1024

HADOOP_HOME=/usr/local/hadoop

Export HIVE_CONF_DIR=/usr/local/hive/conf

Export HIVE_AUX_JARS_PATH=/usr/local/hive/lib

2. Copy a file that copies a hive-site.xml from the template file

Cp / usr/local/hive/conf/hive-default.xml.template. / hive-site.xml

Find value through the value in the name tag, modify the value, and download the configuration file:

Http://down.51cto.com/data/2260702

# # if you copy it intact, you will surely die. Please fill in the relevant parameters and paths according to the actual conditions, and modify your actual values if you mark them in red.

Hive.metastore.warehouse.dir

/ home/hive/warehouse

Javax.jdo.option.ConnectionURL

Jdbc:mysql://192.168.1.108:3306/hive?characterEncoding=UTF-8

# need to add a library to store metadata in the database

Javax.jdo.option.ConnectionUserName

Hive

Javax.jdo.option.ConnectionPassword

Mysql

# # user name and password for connecting to the database, authorized user name and password

Hive.hwi.listen.port

9999

Hive.exec.local.scratchdir

/ home/hive

Hive.downloaded.resources.dir

/ home/hive/tmp

Hive.querylog.location

/ home/hive

Configure log information for output

Hive.log.dir=/home/hive

Hive.log.file=hive.log

Log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

3. Create a directory where HIVE stores data according to configuration

Mkdir / home/hive/tmp-p

4. Configure jdbc connector

1. Download the package and decompress it

2. Copy mysql-connector-java-5.1.6-bin.jar to hive under lib

If you find a prompt when you delete a table:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException

You need to replace the version of the mysql-connector-java package

Link: http://pan.baidu.com/s/1qXIGeSG password: iykt download package

Cp mysql-connector-java-5.1.6-bin.jar / usr/local/hive/lib/

Page prompt

Http://192.168.1.114:50070/explorer.html#/home hints that this needs to be modified.

Permission denied: user=dr.who, access=READ_EXECUTE,inode= "/ home": root:supergroup:drwx-wx-wx

Add the following to vim / usr/local/hadoop/etc/hadoop/hdfs-site.xml, and then restart

Dfs.permissions.enabled

False

# after configuring these, restart the namenode node to take effect

The test hive operation can be displayed in mysql and on the HADOOP page

5.2 create an internal table

Internal table characteristics: if the data is loaded into the internal table, if the data is local, a copy of the local data will be copied to the directory specified by the internal LOCATION, and if the data is on the hdfs, the data in the hdfs will be mv to the LOCATION specified by the internal table. When you delete an internal table, the data under the corresponding LOCATION is deleted.

Create table neibu_table (id int)

Show tables # to view

The default location of hive in hdfs is / user/hive/warehouse, which can be modified and is determined by the attribute hive.metastore.warehouse.dir in the configuration file hive-site.xml, and the student directory is created under / home/hive/warehouse/testdb.db.

You can view it through the browser

Let's start with the difference between internal tables and external tables in Hive: (key points)

1) when creating a table: when creating an internal table, the data will be moved to the path pointed to by the data warehouse; if an external table is created, only the path where the data is located will be recorded and no change will be made to the location of the data.

2) when deleting a table: when deleting a table, the metadata and data of the internal table will be deleted together, while the external table will only delete the metadata, not the data. In this way, external tables are relatively more secure, data organization is more flexible, and it is convenient to share source data.

In addition, it should be noted that the traditional database validates the table data by schema on write (write-time mode), while Hive does not check whether the data conforms to schema when load. Hive follows schema on read (read-time mode), and hive checks and parses specific data fields and schema only when reading.

The advantage of the read-time mode is that load data is very fast because it doesn't need to read the data for parsing, just copying or moving files.

The advantage of the write-time mode is that it improves query performance because columns can be indexed and compressed after pre-parsing, but it also takes more time to load.

1. Internal table operation:

Load data into the table method:

# # Note that the data of the aa file should be created first

Hive > LOAD DATA LOCAL INPATH'/ home/aa'INTO TABLE neibu_table

Loading data to table default.neibu_table

Table default.neibu_table stats: [numFiles=1, totalSize=10]

Time taken: 0.639 seconds

Select * from neibu_table

When using select* without conditions, MapReduce is not executed, and execution is faster; the last line shows null because there is a line of space in the file

Note: the internal table will copy the data to the table directory. If you delete the internal table metadata, the data below the metadata will also be deleted.

The second method of loading data into student

Note the path to the bb.txt file and write a list of numbers

Execute the command hadoop fs-put bb.txt / home/hive/warehouse/neibu_table or hdfsdfs-put t / home/hive/warehouse/neibu_table

Add a library: create database hive20161120

2. Partition table: the log files of each hour or every day are partitioned and stored. Business analysis can be done for a specific time period without having to analyze and scan all the data.

Create table a20161120, in the case of multiple columns, specify the delimiter\ t

Create a partition table:

CREATE TABLE fenqu (id int) PARTITIONED BY (d int)

LOAD DATA LOCAL INPATH 'bb.txt INTO TABLE fenquPARTITION (dumb1)

LOAD DATA LOCAL INPATH 'bb2.txt' INTO TABLE fenquPARTITION (dumb2)

3. Barrel watch

(used in table connection, modular operation is performed according to the number of buckets, and different data are put into different buckets)

Create a table of bucket type

Create table student4 (id int) clustered by (id) into 4 buckets

Bucket table must be enabled

Set hive.enforce.bucketing = true

To insert data, instead of using load, the insert,insert is used to load the data using mapreduce.

Insert into table student4 select id fromstudent3

The table added in hive can view its metadata information in mysql and record all related records under the library defined by the configuration question.

Eg:select * from TBLS

4. External table:

Importing data is the same as internal tables, and there is no repetition here

Hive > drop table outer_table

Time taken: 0.081 seconds

Hive > show tables

Time taken: 0.023 seconds

Hive > create external table outer_table (id int)

Time taken: 0.037 seconds

Hive > select * from outer_table

one

two

three

four

five

six

seven

eight

nine

ten

Time taken: 0.044 seconds, Fetched: 10 row (s)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.