Introduction to four ways of importing data by hive 07/11 Update SLTechnology News&Howtos

Introduction to four ways of importing data by hive

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "introduction of four ways to import data by hive". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the "introduction of four ways to import data from hive".

Several common data import methods of Hive

Here are four:

(1) Import data from the local file system into the Hive table

(2) Import data from HDFS to Hive table

(3) query the corresponding data from other tables and import them into the Hive table

(4) when creating a table, query the corresponding records from other tables and insert them into the created table.

Import data from the local file system into the Hive table

First, create a table in Hive, as follows:

Hive > create table wyp

> (id int, name string

> age int, tel string)

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY'\ t'

> STORED AS TEXTFILE

Time taken: 2.832 seconds

Copy the code

This table is very simple, there are only four fields, I will not explain the specific meaning. There is a / home/wyp/wyp.txt file in the local file system, which is as follows:

[wyp@master ~] $cat wyp.txt

1 wyp 25 13188888888888

2 test 30 13888888888888

3 zs 34 899314121

Copy the code

The data columns in the wyp.txt file are split using\ t. You can import the data in this file into the wyp table with the following statement, as follows:

Hive > load data local inpath 'wyp.txt' into table wyp

Copying data from file:/home/wyp/wyp.txt

Copying file: file:/home/wyp/wyp.txt

Loading data to table default.wyp

Table default.wyp stats:

[num_partitions: 0, num_files: 1, num_rows: 0, total_size: 67]

Time taken: 5.967 seconds

Copy the code

In this way, you can import the contents of wyp.txt into the wyp table and view them in the data directory of the wyp table, as shown in the following command:

Hive > dfs-ls / user/hive/warehouse/wyp

Found 1 items

-rw-r--r--3 wyp supergroup 67 2014-02-19 18:23 / hive/warehouse/wyp/wyp.txt

Copy the code

It is important to note that:

Unlike the relational database we are familiar with, Hive does not support giving the literal form of a set of records directly in the insert statement, that is, Hive does not support INSERT INTO. . A statement in the form of VALUES.

2. Import data into Hive table on HDFS

In the process of importing data into the Hive table from the local file system, you are actually temporarily copying the data to a directory in HDFS (typically to the uploaded user's HDFS home directory, such as / home/wyp/), and then moving the data from that temporary directory (note, this is about moving, not copying! Into the data directory of the corresponding Hive table In that case, Hive certainly supports moving data directly from a directory on HDFS to the data directory of the corresponding Hive table, assuming the following file / home/wyp/add.txt, as follows:

[wyp@master / home/q/hadoop-2.2.0] $bin/hadoop fs-cat / home/wyp/add.txt

5 wyp1 23 131212121212

6 wyp2 24 134535353535

7 wyp3 25 132453535353

8 wyp4 26 154243434355

Copy the code

The above is the content that needs to be inserted. This file is stored in the HDFS / home/wyp directory (different from the one mentioned in one, the file mentioned in one is stored on the local file system). We can import the contents of this file into the Hive table with the following command, as follows:

Hive > load data inpath'/ home/wyp/add.txt' into table wyp

Loading data to table default.wyp

Table default.wyp stats:

[num_partitions: 0, num_files: 2, num_rows: 0, total_size: 215]

Time taken: 0.47 seconds

Hive > select * from wyp

5 wyp1 23 131212121212

6 wyp2 24 134535353535

7 wyp3 25 132453535353

8 wyp4 26 154243434355

1 wyp 25 13188888888888

2 test 30 13888888888888

3 zs 34 899314121

Time taken: 0.096 seconds, Fetched: 7 row (s)

Copy the code

From the execution results above, we can see that the data is indeed imported into the wyp table! Please note that there is no word local in load data inpath'/ home/wyp/add.txt' into table wyp;. This is the difference from No.1 Middle School.

Third, query the corresponding data from other tables and import them into the Hive table

Suppose there is a test table in Hive, and its table creation statement is as follows:

Hive > create table test (

> id int, name string

>, tel string)

> partitioned by

> (age int)

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY'\ t'

> STORED AS TEXTFILE

Time taken: 0.261 seconds

Copy the code

In general, it is similar to the table-building statement of the wyp table, except that the test table uses age as the partition field. For partitions, here is an explanation:

Partitions: in Hive, each partition of a table corresponds to the corresponding directory under the table, and the data of all partitions is stored in the corresponding directory. For example, if the wyp table has two partitions, dt and city, the directory of the corresponding dt=20131218,city=BJ table is / user/hive/warehouse/dt=20131218/city=BJ, and all data belonging to this partition is stored in this directory.

The following statement inserts the query results from the wyp table into the test table:

Hive > insert into table test

> partition (age='25')

> select id, name, tel

> from wyp

A bunch of Mapreduce task information is output here, omitted here

Total MapReduce CPU Time Spent: 1 seconds 310 msec

Time taken: 19.125 seconds

Hive > select * from test

5 wyp1 131212121212 25

6 wyp2 134535353535 25

7 wyp3 132453535353 25

8 wyp4 154243434355 25

1 wyp 13188888888888 25

2 test 13888888888888 25

3 zs 899314121 25

Time taken: 0.126 seconds, Fetched: 7 row (s)

Copy the code

Here are some instructions:

We know that our traditional block form, insert into table values (Field 1, Field 2), is not supported by hive.

From the output above, we can see that the query from the wyp table has been successfully inserted into the test table! If there is no partition field in the target table (test), you can remove the partition (age='25') statement. Of course, we can also specify the partition dynamically by using the partition value in the select statement:

Hive > set hive.exec.dynamic.partition.mode=nonstrict

Hive > insert into table test

> partition (age)

> select id, name

> tel, age

> from wyp

A bunch of Mapreduce task information is output here, omitted here

Total MapReduce CPU Time Spent: 1 seconds 510 msec

Time taken: 17.712 seconds

Hive > select * from test

5 wyp1 131212121212 23

6 wyp2 134535353535 24

7 wyp3 132453535353 25

1 wyp 13188888888888 25

8 wyp4 154243434355 26

2 test 13888888888888 30

3 zs 899314121 34

Time taken: 0.399 seconds, Fetched: 7 row (s)

Copy the code

This method is called dynamic partition insertion, but Hive is turned off by default, so you need to set hive.exec.dynamic.partition.mode to nonstrict before using it. Of course, Hive also supports insert overwrite to insert data, from the literal we can see that overwrite means overwrite, yes, after the execution of this statement, the data under the corresponding data directory will be overwritten! Insert into will not, notice the difference between the two. Examples are as follows:

Hive > insert overwrite table test

> PARTITION (age)

> select id, name, tel, age

> from wyp

Copy the code

What's more, Hive also supports multiple table inserts. What does that mean? In Hive, we can reverse the insert statement and put from first, and its execution effect is the same as that after it, as follows:

Hive > show create table test3

CREATE TABLE test3 (

Id int

Name string)

Time taken: 0.277 seconds, Fetched: 18 row (s)

Hive > from wyp

> insert into table test

> partition (age)

> select id, name, tel, age

> insert into table test3

> select id, name

> where age > 25

Hive > select * from test3

8 wyp4

2 test

3 zs

Time taken: 4.308 seconds, Fetched: 3 row (s)

Copy the code

You can use multiple insert clauses in the same query, which has the advantage that we only need to scan the source table once to produce multiple disjoint outputs. This is cool!

Fourth, when creating a table, query the corresponding records from other tables and insert them into the created table

In practice, the output of the table may be too many to be displayed on the console. At this time, it is very convenient to store the query output of Hive directly in a new table. We call this situation CTAS (create table. As select) is as follows:

Hive > create table test4

> as

> select id, name, tel

> from wyp

Hive > select * from test4

5 wyp1 131212121212

6 wyp2 134535353535

7 wyp3 132453535353

8 wyp4 154243434355

1 wyp 13188888888888

2 test 13888888888888

3 zs 899314121

Time taken: 0.089 seconds, Fetched: 7 row (s)

Copy the code

The data is inserted into the test4 table, and the CTAS operation is atomic, so if the select query fails for some reason, the new table will not be created!

At this point, I believe you have a deeper understanding of the "introduction of the four ways to import data from hive". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.