How Hive creates internal tables 07/09 Update SLTechnology News&Howtos

How Hive creates internal tables

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to create an internal table in Hive. It is very detailed and has a certain reference value. Friends who are interested must read it!

I was talking about the external table, which is the internal table when the EXTERNAL keyword is removed. Why is it called an internal table? because of this table, Hive controls (more or less) the life cycle of the data.

If you are familiar with Hive, you must know that Hive stores the data of these tables by default in a subdirectory defined by the configuration item hive.metastore.warehouse.dir (for example, / user/hive/warehouse).

When we delete an internal table, Hive also deletes the data in that table. Accordingly, internal tables are not convenient to share data with other workstations. For example, there is a piece of data that is being used by other programs, and we also want to use Hive to perform some queries on this data, but without giving Hive ownership of the data, we can create an external table pointing to the data without having to own it. What if the internal table wants this data? use LOAD.

LOAD DATA LOCAL INPATH / data/ OVERWRITE INTO TABLE ods_login PARTITION (dt='2020-03-01')

OVERWRITE: overrid

If the user specifies the OVERWRITE keyword, the previously existing data in the destination folder will be deleted first. Without this keyword, only the new files will be added to the target folder without deleting the previous data. If a file with the same name as the loaded file already exists in the destination folder, the old file with the same name will be overwritten.

PARTITION: importing partition

If the partition directory does not exist, this command creates the partition directory and then copies the data to that directory. If the target table is a non-partitioned table, the PARTITION clause should be omitted from the statement.

Typically, the specified path should be a directory, not a single separate file. Hive copies all files to this directory. This makes it easier for users to organize data into multiple files while modifying the file naming rules without modifying the Hive script. In any case, the file is copied to the destination table path and the file name remains the same.

If the keyword LOCAL is used, the path should be the local file system path. The data will be copied to the target location. If the LOCAL keyword is omitted, then this path should be the path of HDFS. In this case, the data is transferred from this path to the target location.

I mentioned Hive's schema on read earlier, and the advantage of this mode is that load data is very fast, because it doesn't need to read data for parsing, just copy or move files. The advantage of Mysql's write-time mode is that it improves query performance because columns can be indexed and compressed after pre-parsing, but it also takes more time to load.

It is important to note that:

If the loaded file is on HDFS, the file will be moved to the table path

If the loaded file is local, the file will be copied to the table path of HDFS

A MR task is launched for each file to be imported.

Like internal tables, external tables need to specify the path to the data file where the data table is located when creating the table-hereinafter referred to as the table path. If not specified, Hive automatically assigns one. The automatically assigned directory is the directory named after the table under the hive home directory

If you create an external table, specify the data storage directory for the table. Hive parses all existing files in this directory as data files for the table when querying. At this point, Hive still creates a directory named by the table under the directory of hive, but the contents of the directory are empty.

The above is all the content of the article "how to create internal tables in Hive". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.