Several data Import methods of Hive 10/18 Update SLTechnology News&Howtos

Several data Import methods of Hive

2025-10-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

After studying Hive for such a long time, I found that there is not a complete book about Hive in China, and the information on the Internet is very messy, so I decided to write some series articles about "those things about Hive" and share them with you. I will sort out the information about Hive in the following time. If you are interested in Hive, please follow this blog.

Today's topic is to summarize several common data import methods of Hive, which I summarize into four:

(1), import data from the local file system to the Hive table; (2), import data from HDFS to the Hive table; (3) query the corresponding data from other tables and import them into the Hive table; (4), when creating the table, query the corresponding records from other tables and insert them into the created table.

Before sharing, I would like to introduce my big data exchange group: 784557197, whether college students or workers, as long as you want to learn, welcome to the exchange.

I will actually operate on the import of each kind of data, because pure text makes people look boring and abstract to learn. All right, let's do it!

Import data from the local file system into the Hive table

First, create a table in Hive, as follows:

Hive > create table wyp > (id int, name string, > age int, tel string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY'\ t'> STORED AS TEXTFILE;OKTime taken: 2.832 seconds

This table is very simple, there are only four fields, I will not explain the specific meaning. There is a / home/wyp/wyp.txt file in the local file system, which is as follows:

[wyp@master] $cat wyp.txt1 wyp 25 131888888888882 test 30 138888888888883 zs 34899314121

The data columns in the wyp.txt file are split using\ t. You can import the data in this file into the wyp table with the following statement, as follows:

Hive > load data local inpath 'wyp.txt' into table wyp;Copying data from file:/home/wyp/wyp.txtCopying file: file:/home/wyp/wyp.txtLoading data to table default.wypTable default.wyp stats: [num _ partitions: 0, num_files: 1, num_rows: 0, total_size: 67] OKTime taken: 5.967 seconds

In this way, you can import the contents of wyp.txt into the wyp table and view them in the data directory of the wyp table, as shown in the following command:

Hive > dfs-ls / user/hive/warehouse/wyp; Found 1 items-rw-r--r--3 wyp supergroup 67 2014-02-19 18:23 / hive/warehouse/wyp/wyp.txt

The data is indeed imported into the wyp table.

Unlike the relational database we are familiar with, Hive does not support giving the literal form of a set of records directly in the insert statement, that is, Hive does not support INSERT INTO. A statement in the form of VALUES. .

2. Import data into Hive table on HDFS

Into the data directory of the corresponding Hive table In that case, Hive certainly supports moving data directly from a directory on HDFS to the data directory of the corresponding Hive table, assuming the following file / home/wyp/add.txt, as follows:

[wyp@master / home/q/hadoop-2.2.0] $bin/hadoop fs-cat / home/wyp/add.txt5 wyp1 23 13121212126 wyp2 24 1345353535357 wyp3 25 1324535353538 wyp4 26 154243434355

The above is the content that needs to be inserted. This file is stored in the HDFS / home/wyp directory (different from the one mentioned in one, the file mentioned in one is stored on the local file system). We can import the contents of this file into the Hive table with the following command, as follows:

Hive > load data inpath'/ home/wyp/add.txt' into table wyp;Loading data to table default.wypTable default.wyp stats: [num _ partitions: 0, num_files: 2, num_rows: 0, total_size: 215] OKTime taken: 0.47 secondshive > select * from wyp OK5 wyp1 23 13121212126 wyp2 24 1345353535357 wyp3 25 13245353538 wyp4 26 1542434343551 wyp 25 131888888888882 test 30 138888888883 zs 34 899314121Time taken: 0.096 seconds, Fetched: 7 row (s)

From the execution results above, we can see that the data is indeed imported into the wyp table! Please note that there is no word local in load data inpath'/ home/wyp/add.txt' into table wyp;. This is the difference from No.1 Middle School.

Third, query the corresponding data from other tables and import them into the Hive table

Suppose there is a test table in Hive, and its table creation statement is as follows:

Hive > create table test (. > id int, name string. >, tel string). > partitioned by. > (age int). > ROW FORMAT DELIMITED. > FIELDS TERMINATED BY'\ t'. > STORED AS TEXTFILE;OKTime taken: 0.261 seconds

In general, it is similar to the table-building statement of the wyp table, except that age is used as the partition field in the test table. The following statement inserts the query results from the wyp table into the test table:

Hive > insert into table test > partition (age='25') > select id, name, tel > from wyp # A bunch of Mapreduce task information is output here Omit # Total MapReduce CPU Time Spent: 1 seconds 310 msecOKTime taken: 19.125 secondshive > select * from test OK5 wyp1 1312121212256 wyp2 13453535353535 257 wyp3 132453535353 258 wyp4 1542434355 251 wyp 13188888888888 252 test 138888888888253 zs 899314121 25Time taken: 0.126 seconds, Fetched: 7 row (s)

From the output above, we can see that the query from the wyp table has been successfully inserted into the test table! If the partition field does not exist in the target table (test), you can remove the partition (age='25') statement. Of course, we can also specify the partition dynamically by using the partition value in the select statement:

Hive > set hive.exec.dynamic.partition.mode=nonstrict;hive > insert into table test > partition (age) > select id, name, > tel, age > from wyp # A bunch of Mapreduce task information is output here Omit # Total MapReduce CPU Time Spent: 1 seconds 510 msecOKTime taken: 17.712 secondshive > select * from test OK5 wyp1 131212121212236 wyp2 13453535353535 247 wyp3 132453535353 251 wyp 13188888888888 258 wyp4 1542434355 262 test 13888888888888 303 zs 899314121 34Time taken: 0.399 seconds, Fetched: 7 row (s)

This method is called dynamic partition insertion, but Hive is turned off by default, so you need to set hive.exec.dynamic.partition.mode to nonstrict before using it. Of course, Hive also supports insert overwrite to insert data, from the literal we can see that overwrite means overwrite, yes, after the execution of this statement, the data under the corresponding data directory will be overwritten! Insert into will not, notice the difference between the two. Examples are as follows:

Hive > insert overwrite table test > PARTITION (age) > select id, name, tel, age > from wyp

What's more, Hive also supports multiple table inserts. What does that mean? In Hive, we can reverse the insert statement and put from first, and its execution effect is the same as that after it, as follows:

Hive > show create table test3;OKCREATE TABLE test3 (id int, name string) Time taken: 0.277 seconds, Fetched: 18 row (s) hive > from wyp > insert into table test > partition (age) > select id, name, tel, age > insert into table test3 > select id, name > where age > 25 X hive > select * from test3;OK8 wyp42 test3 zsTime taken: 4.308 seconds, Fetched: 3 row (s)

You can use multiple insert clauses in the same query, which has the advantage that we only need to scan the source table once to produce multiple disjoint outputs. This is cool!

Fourth, when creating a table, query the corresponding records from other tables and insert them into the created table

In practice, the output of the table may be too many to be displayed on the console. At this time, it is very convenient to store the query output of Hive directly in a new table. We call this situation CTAS (create table. As select) is as follows:

Hive > create table test4 > as > select id, name, tel > from wyp;hive > select * from test4;OK5 wyp1 13121212126 wyp2 1345353535357 wyp3 13245353538 wyp4 1542434343551 wyp 131888888882 test 138888888883 zs 899314121Time taken: 0.089 seconds, Fetched: 7 row (s)

The data is inserted into the test4 table, and the CTAS operation is atomic, so if the select query fails for some reason, the new table will not be created!

All right, it's late, that's all for today, wash up and sleep!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.