Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What do partitions and buckets mean in Hive

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the meaning of partitions and buckets in Hive". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the meaning of partition and bucket in Hive"?

Partitions and buckets in hive

Hive organizes tables into "partitions" Partition. This is a mechanism to roughly partition a table according to the value of the "Partition column column" (such as date). Using partitioning can speed up the query speed of data slicing (Slice).

Tables and partitions can be further divided into "Bucket" which provides additional results for data for more efficient query processing. For example, by dividing buckets according to user ID, we can quickly calculate user-based queries on random samples of all user sets.

Consider the log file below. Each record in the log file contains a partition. We usually partition according to the date, and the records of the same day will be placed in the same area.

Partitions are defined when the table is created using the partition by clause, which defines a list of columns

In the case of buckets, the table can be divided into specified parts, each of which is determined according to the established column model. below, we determine the bucket for 4 based on ts.

As follows:

The specified partition value to be displayed when we load the data into the partition table, for example, we have a file 20140418GB.txt under the directory that contains contents

We load this data into the table logs

Now let's look at the HDFS structure and the data in it.

We look at it in eclipse

But the strange thing is, we can see the catalogue inside.

There are no buckets here, and then we look up the data in the form of buckets.

The results are as follows

This result includes records that meet the requirements in all three documents.

I have used excessive buckets alone, and the experiment shows that if you divide the buckets separately, you will see the pieces of the buckets, but if the partition and the buckets come together, you can only see the directory of the partition.

Can we understand that when the partition and the bucket come together, the divided area is the mainstream, and we can see the directory in the partition, but the divided bucket, we no longer see the structure, but it just doesn't show.

And when we look for it, we can use it.

At this point, I believe you have a deeper understanding of "what is the meaning of partition and bucket in Hive". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report