The basic principle of Hive and how to build the environment 07/08 Update SLTechnology News&Howtos

The basic principle of Hive and how to build the environment

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Hive basic principles and how to build the environment, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Today, I was mainly messing with this Hive. I read the book in the morning, and at first it was a bit messy. Later, I slowly found that hive is actually quite simple. In my understanding, it is something related to the database. In that case, it will be much easier for me, because I should be more familiar with sql syntax, and this is HQL, but many of them are almost the same. Let's first take a look at the basic introduction to Hive:

I. the basic principles of Hive

Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide simple sql query function, and transform sql statements into MapReduce tasks to run. Its advantage is that the learning cost is low, simple MapReduce statistics can be quickly realized through SQL-like statements, and there is no need to develop special MapReduce applications, so it is very suitable for statistical analysis of data warehouse.

Hive stores metadata in a database (RDBMS), such as MySQL, Derby. Hive has three modes to connect to data: single-user mode, multi-user mode, and remote service mode. (that is, embedded mode

, local mode, remote mode.

1.1 Hive architecture:

Hive architecture diagram: mainly divided into: user interface, Thrift server, metadata storage, parser, Hadoop

1.2 Hive data type

The storage of Hive is based on the Hadoop file system, and it has no special data storage format. It mainly includes four types of data models:

Table (Table)

Partition (Partition)

Bucket (Bucket)

External table (External Table)

Key points of the execution process of 1.3Hive

Operator (Operator) is the minimum processing unit of Hive

Each operator process represents a HDFS operation or MR job

The compiler converts Hive SQL into a set of operators

Hive performs MapReduce tasks through ExecMapper and ExecReducer

There are two modes when executing MapReduce: local mode and distributed mode

Common Hive operators (in part) are as follows:

1.4 HQL operation of Hive

The basic operation of hive is similar to that of sql, for example:

Select u.name, o.orderid from order o join user u on o.uid = u.uidscape select dealid, count (distinct uid), count (distinct date) from order group by dealid

Simple Hive statement:

Create table student

(

Name string

Sex string

Age int

);

II. Basic configuration of Hive

1. Find hive from the hadoop on apache's official website. The latest version is 2.0.1. That's what I put down, http://hive.apache.org/downloads.html.

2. Download the mysql driver. At present, it is 5.1.38. I have sorted out two required compressed packages, which can be downloaded through the following link: (I will post later)

3. Extract it to the directory you need. I placed it in / home/admin1/ download / hive-2.0.1, put the mysql driver in the lib package of hive, and then configure the following files in hive-2.0.1/conf:

Create a new file hive-env.sh

Just change the directory inside to the directory placed by your hadoop.

Export HIVE_HOME=/home/admin1/ download / hive-2.0.1export PATH=$PATH:$HIVE_HOME/binHADOOP_HOME=/home/admin1/ download / hadoop-2.5.2export HIVE_CONF_DIR=/home/admin1/ download / hive-2.0.1/confexport HIVE_AUX_JARS_PATH=/home/admin1/ download / hive-2.0.1/lib

Also need to create a new hive-site.xml:

Here I use the mysql account and password to configure, other you can also refer to the configuration.

Javax.jdo.option.ConnectionURLjdbc:mysql://localhost:3306/hive?=createDatabaseIfNotExist=trueJDBC connect string for a JDBCmetastore javax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBCmetastore javax.jdo.option.ConnectionUserNamehiveusername to use against metastoredatabasejavax.jdo.option.ConnectionPasswordapassword to use against metastoredatabase

Start: execute in / home/admin1/ download / hive-2.0.1:

Bin/hive

If it cannot be initialized, then:

Bin/schematool-dbType mysql-initSchema

In the hive2.0 version and above, you need to initSchema, otherwise you will report an error. I also had this problem for several hours, and finally found that it was very simple to solve it.

Finally, when you install mysql, you can download it directly with the uk software in ubuntu. Search for mysql in it, and then download mysql's server, client and working platform. There is no need to repeat it here, just create a new user on the console:

Mysql-uroot

Create user 'hive' identify by' hive'

Create database hive

Grant all privileges on *. * to 'hive'@'localhost' identified by' hive'

Flush privileges

Then you can log in through your hive account.

Mysql-u hive-p

Then enter the password hive to log in successfully, and configure the login information to hive-site.xml.

Then you can happily use hive, create tables, and so on. Remember to turn on hadoop's service, sbin/start-all.sh

Summary: the two main problems encountered today are that the error has been reported after 1:bin/hive, and it will be fine after initialization. 2, the sublim-text in linux can not enter Chinese, and can not download gpk solution, can not compile sublime_imfix.c, and then through the github found that the compiled library, and then import, after a series of complex operations finally successfully solved this problem. Find the right method, find the right tool.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.