Hive creates tables and inserts data demo 04/20 Update SLTechnology News&Howtos

Hive creates tables and inserts data demo

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Create table student (id int comment "student id", name string comment "student name", age int comment "student age")

Comment "Student Information Table"

Row format delimited fields terminated by ","

Create external table student_ext (id int comment "student id", name string comment "student name", age int comment "student age")

Comment "Student Information Table"

Row format delimited fields terminated by ","

Location "/ user/hive/student_ext"

Create external table student_ptn (id int comment "student id", name string comment "student name", age int comment "student age")

Comment "Student Information Table"

Partitioned by (city string)

Row format delimited fields terminated by ","

Location "/ user/hive/student_ptn"

Set hive.exec.dynamici.partition=true; # enables dynamic partitioning. Default is false.

Set hive.exec.dynamic.partition.mode=nonstrict; # enables all partitions to be dynamic, otherwise there must be static partitions to use.

Set hive.exec.dynamic.partition=true; (can be seen through this statement: set hive.exec.dynamic.partition;)

Set hive.exec.dynamic.partition.mode=nonstrict

SET hive.exec.max.dynamic.partitions=100000; (if the number of automatic partitions is greater than this parameter, an error will be reported)

SET hive.exec.max.dynamic.partitions.pernode=100000

Insert into table student_ptn partition (city) select 6, "yangdong", 29, "beijing"

Insert into table student_ptn partition (city) select 2, "limei", 22, "chongqing"

Insert into table student_ptn partition (city) select 3, "wangxing", 25, "beijing"

Insert into table student_ptn partition (city) select 4, "chenming", 22, "beijing"

Insert into table student_ptn partition (city) select 5, "xiali", 26, "chongqing"

Create external table student_bck (id int comment "student id", name string comment "student name", age int comment "student age")

Comment "Student Information Table"

Clustered by (id) sorted by (id asc) into 2 buckets

Row format delimited fields terminated by ","

Location "/ user/hive/student_bck"

Insert into table student_bck

Select * from student

Create table cdt (

Id int

Name string

Work_location array

Piaofang map

Address struct)

Row format delimited

Fields terminated by "\ t"

Collection items terminated by ","

Map keys terminated by ":"

Lines terminated by "\ n"

Load json strings into table json

{"movie": "1193", "rate": "5", "timeStamp": "978300760", "uid": "1"}

{"movie": "661", "rate": "3", "timeStamp": "978302109", "uid": "1"}

{"movie": "914", "rate": "3", "timeStamp": "978301968", "uid": "1"}

{"movie": "3408", "rate": "4", "timeStamp": "978300275", "uid": "1"}

{"movie": "2355", "rate": "5", "timeStamp": "978824291", "uid": "1"}

{"movie": "1197", "rate": "3", "timeStamp": "978302268", "uid": "1"}

{"movie": "1287", "rate": "5", "timeStamp": "978302039", "uid": "1"}

{"movie": "2804", "rate": "5", "timeStamp": "978300719", "uid": "1"}

{"movie": "594", "rate": "4", "timeStamp": "978302268", "uid": "1"}

CREATE TABLE json (

Data string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hmaster:9000/user/hive/warehouse/plusorg.db/json'

TBLPROPERTIES (

'transient_lastDdlTime'='1542008332')

Load data local inpath "/ root/json.txt" into table json

Get_json_object (data,'$.movie') built-in function to parse a column of data

Select get_json_object (data,'$.movie') as movie from json

Json_tuple (jsonStr, K1, K2,...) Built-in function to parse json string class data

The parameter is a set of keys K1J K2. And the JSON string, the tuple of the return value. This method is more efficient than get_json_object because you can enter multiple keys in a single call

Select

B.b_movie

B.b_rate

B.b_timeStamp

B.b_uid

From json a

Lateral view json_tuple (a. DataGravity, recording movieology, playbook, ratewriting, playbook, timestampery, playbook, etc.) b as baking moviebook, bouncing ratebook, baking timeStampmenbenduid.

Create table rate (movie int, rate int, unixtime int, userid int) row format delimited fields

Terminated by'\ t'

Insert into table rate select

Get_json_object (data,'$.movie') as moive

Get_json_object (data,'$.rate') as rate

Get_json_object (data,'$.timeStamp') as unixtime

Get_json_object (data,'$.uid') as userid

From json

Select from_unixtime (unixtime,'yyyy/MM/dd HH:mm:ss') from rate

Create table lastjsontable (movie int, rate int, utime date, userid int) row format delimited

Fields terminated by'\ t'

Add a Python script, and hive can access it. The path is to store the absolute path on the current Unix server.

Add file / home/pythoncode/WeekdayMapper.py

Insert into table lastjsontable

Select

Transform (movie,rate,unixtime,userid) # input values (base table)

Using 'python WeekdayMapper.py' # using script cleaning

As (movie,rate,utime,userid) # output values (child table)

From rate; # base table

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.