Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Hive data Storage

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you the example analysis of Hive data storage, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

First of all, Hive does not have a special data storage format, nor does it index the data. Users are very free to organize tables in Hive, as long as they tell Hive the column delimiter and row separator in the data when creating the table, and Hive can parse the data.

Second, all the data in Hive is stored in HDFS, and Hive contains the following data model: Table,External Table,Partition,Bucket.

1) Table table: a table is a directory in hdfs

2) Partition: an extent in a table is a subdirectory under the table's directory

3) Bucket Bucket: if there is a partition, the bucket is a unit under the zone. If there is no zone in the table, then the bucket is directly the unit under the table, and the bucket is generally in the form of a file.

Table in Hive is similar in concept to Table in database, and each Table has a corresponding directory to store data in Hive. For example, a table pvs, whose path in HDFS is: / wh/pvs, where wh is the directory of the data warehouse specified by ${hive.metastore.warehouse.dir} in hive-site.xml, where all Table data (excluding External Table) is stored in this directory.

Partition corresponds to a dense index of Partition columns in the database, but the organization of Partition in Hive is very different from that in the database. In Hive, a Partition in a table corresponds to a directory under the table, and all Partition data is stored in the corresponding directory. For example, if the pvs table contains two Partition, ds and city, the HDFS subdirectory of ctry=US corresponds to ds=20090801: / wh/pvs/ds=20090801/ctry=US; corresponds to ds=20090801, and the HDFS subdirectory of ctry=CA is; / wh/pvs/ds=20090801/ctry=CA. Whether the table is partitioned or not and how to add partitions can be done in the Hive-QL language. Through the partition, that is, the storage form of the directory, Hive can easily query the partition conditions.

Buckets calculates the hash for the specified column and splits the data according to the hash value in order to be parallel, with each Bucket corresponding to a file. Divide the user column into 32 bucket, and first calculate the hash for the value of the user column. The HDFS directory with a hash value of 0 is: the HDFS directory with a / wh/pvs/ds=20090801/ctry=US/part-00000;hash value of 20 is: / wh/pvs/ds=20090801/ctry=US/part-00020. Buckets are the final storage form of Hive. When creating a table, the user can describe the buckets and columns in detail.

External Table points to data that already exists in HDFS, and you can create a Partition. It is the same as Table in the organization of metadata, but the storage of actual data is quite different.

During the process of creating Table and loading data (both of which can be done in the same statement), the actual data will be moved to the data warehouse directory during loading, and then the access to the data pairs will be completed directly in the data warehouse directory. When you delete a table, the data and metadata in the table are deleted at the same time.

External Table has only one process, loading data and creating tables at the same time (CREATE EXTERNAL TABLE... LOCATION), the actual data is stored in the HDFS path specified after the LOCATION and is not moved to the data warehouse directory. When you delete an External Table, delete only

The above is all the contents of the article "sample Analysis of Hive data Storage". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report