How to realize data loading by Hive 01/03 Update SLTechnology News&Howtos

How to realize data loading by Hive

2026-01-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to achieve data loading in Hive. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Since Hive does not have row-level data insert, data update, and delete operations, the only way to load data into a table is to use a "large" data load operation. Or simply write the file to the correct directory in some other way.

The first method has been introduced before, LOAD, one command at a time

LOAD DATA LOCAL INPATH / data/ OVERWRITE INTO TABLE ods_login PARTITION (dt='2020-03-01')

The second way is to insert data into the table through query statements.

The INSERT statement allows the user to insert data into the target table through a query statement. Still using the table ods_login in the previous chapter as an example, assuming that our ods_login already has data through load, and the next layer of the warehouse is dwd, then we need to write the data from ods_login to dwd_login, like this (omitting the data cleaning logic)

INSERT OVERWRITE TABLE dwd_loginPARTITION (dt='20200302') SELECT * FROM ods_loginWHERE dt='20200302'

The OVERWRITE keyword is used here, so the contents of the previous partition (or, if it is a non-partitioned table, the contents of the previous table) will be overwritten. If you don't use the OVERWRITE keyword or replace it with the INTO keyword here, Hive will write the data appending without overwriting the pre-existing content.

Create a table and load data in a single query statement

Users can also create a table and load the query results into the table in a single statement:

CREATE TABLE dwd_loginAS SELECT name, salary, addressFROM ods_loginWHERE date='20200302'

This table contains only the information for the name, salary, and address3 fields in the dwd_ log table. The schema of the new table is generated based on the SELECT statement.

A common way to use this feature is to select some of the required datasets from a large, wide table.

This feature cannot be used for external tables. Recall that using the ALTER TABLE statement, you can "reference" an external table to a partition, where the data is not "loaded" itself, but rather a path to the data is specified in the metadata.

However, if the table ods_login is very large and the data from different fields in the table needs to be processed differently and then written to different tables, it is not that simple. Fortunately, Hive provides another INSERT syntax, which scans the input data only once and then divides it in a variety of ways. As follows:

FROM ods_loginINSERT OVERWRITE TABLE dwd_loginPARTITION (dt='20200302') SELECT * WHERE event='login' and dt='20200302'INSERT OVERWRITE TABLE dwd_loginPARTITION (dt='20200303') SELECT * WHERE event='login' and dt='20200303'INSERT OVERWRITE TABLE dwd_loginPARTITION (dt='20200304') SELECT * WHERE event='login' and dt='2020030

Each record read from the ods_ log table passes through a SELECT. WHERE... Make a judgment in a sentence. These sentences are judged independently, not IF...THEN...ELSE... structure

In fact, by using this structure, some data in the source table can be written to multiple partitions of the target table or not to any one partition.

If a record satisfies a certain SELECT. WHERE... Statement, the record is written to the specified table and partition. To put it simply, each INSERT clause can be inserted into a different table if necessary, and those target tables can be either partitioned or non-partitioned.

As a result, some of the input data may be output to multiple output locations while others may be deleted. Of course, you can mix INSERT OVERWRITE and INSERT INTO sentences here.

This is the end of the article on "how to achieve data loading in Hive". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.