Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement DML data Operation, Partition Table and Bucket Table by Hive

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you the "Hive how to achieve DML data operations, partition tables and bucket tables", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "Hive how to achieve DML data operations, partition tables and bucket tables" this article.

1. DML data manipulation 1.1, data import 1. Import path' [overwrite] # [local] of load data [local] inpath 'data through load data: if this field is not added, the path is HDFS. Add local to indicate the local path # [overwrite]: if you add this field, the second import will overwrite the data imported for the first time. Do not add will append the into table table name [partition (partcol1=val1, …)] ; # [partition (partcol1=val1, …)] Specify the field of the partition (later). Tip:set hive.exec.mode.local.auto=true; uses local mode to run MR (only under certain conditions to run locally but not satisfied with the cluster)-2. Insert data into the table through the query statement (Insert) 2.1 insert new data insert into student values directly into the table (1) insert the results of the query into the table (note: the number of columns of the query results must be the same as the columns of the original table (number and type of columns)) insert overwrite table table name sql statement -3. Create a table in the query statement and load the data (As Select) create table if not exists table name as sql statement;-4. Load the data path create table if not exists student3 (id int, name string) row format delimited fields terminated by'\ t 'location' / input' through Location when creating the table -5. Import data (only exported data can be imported) Note: the table must not exist, otherwise the import table library name will be wrong. Table name from 'HDFS exported path'; 1.2. data export 1. Insert export insert overwrite [local] directory 'path' row format delimited fields terminated by'\ t'# specify delimiter sql query statement; # local: if the path exported by this field is local. If the path exported without this field is HDFS example: insert overwrite local directory'/ opt/module/hive/datas2' row format delimited fields terminated by'\ t 'select * from db4.student3; insert overwrite directory' / output' row format delimited fields terminated by'\ t 'select * from db4.student3 -2. The Hadoop command exports to the path to the data in the local hadoop fs-get table 'Local path' hdfs dfs-get 'path to the data in the table' Local path'in the hive client: path to the data in the dfs-get 'table' Local path'-- -- the 3.Hive Shell command exports the bin/hive-e'select * from table name Local path;-4 Export is exported to the export table library name on HDFS. Table name to 'HDFS path';-5.Sqoop export will be mentioned later. 2. Partition table and bucket table 2.1.Create partition table create table table name (deptno int, dname string, loc string) partitioned by (field name field type) # specify partition field row format delimited fields terminated by'\ t' Case: create table dept_partition (deptno int, dname string, loc string) partitioned by (day string) row format delimited fields terminated by'\ t' -the operation of two partition tables: 1. Add partition alter table table name add partition (partition field name = 'value') partition (partition field name = 'value'). two。 View the partition show partitions table name; 3. Delete partition alter table table name drop partition (partition field name = 'value') and partition (partition field name = 'value'). 4. Add data to the partition table load data [local] inpath 'path' [overwrite] into table table name partition (partition field name = 'value') -create a secondary partition table create table table name (deptno int, dname string) Loc string) partitioned by (field name 1 field type, field name 2 field type,.) Case: create table dept_partition2 (deptno int, dname string, loc string) partitioned by (day string, hour string) row format delimited fields terminated by'\ t' Add data to the secondary partition table (created directly if the partition does not exist in the case of load data): load data local inpath'/ opt/module/hive/datas/dept_20200401.log' into table dept_partition2 partition (day='20200401', hour='12'); load data local inpath'/ opt/module/hive/datas/dept_20200402.log' into table dept_partition2 partition (day='20200401', hour='13') -four data and partition association method 1. Execute the repair command msck repair table table name; 2. Method 2: add partition alter table table name add partition (field name = 'value') after uploading data; 3. Method 3: after creating a folder, load data to the partition (the partition will be created directly) load data local inpath'/ opt/module/hive/datas/dept_20200402.log' into table dept_partition2 partition (day='20200401', hour='13'); 2.2, split bucket table 1 to create sub-bucket table: create table table name (id int, name string) clustered by (id) # id: sub-bucket field. When the bucket is divided, the bucket will be divided according to this id. The number of into barrels buckets row format delimited fields terminated by'\ tbarrel; case: create table stu_buck (id int, name string) clustered by (id) into 4 buckets row format delimited fields terminated by'\ tbarrel; Note: 1. In the new version of hive, we run MR when we send load data to a bucket table, so the path to load data is best placed on HDFS. two。 Our number of barrels should be equal to the number of ReduceTask. 3. The principle of bucket splitting: calculate which bucket the data should enter according to the hashCode value% number of buckets of the contents of the field. These are all the contents of the article "how Hive implements DML data manipulation, partition table and bucket table". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report