Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of hive Native and compound data

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the example analysis of hive native and compound data, which is very detailed and has a certain reference value. Interested friends must read it!

Primary type

Native types include TINYINT,SMALLINT,INT,BIGINT,BOOLEAN,FLOAT,DOUBLE,STRING,BINARY (available only after Hive 0.8.0) and TIMESTAMP (available only if Hive is above 0.8.0). These data are easy to load, as long as you set the column delimiter and output it to a file according to the column delimiter.

Suppose there is such a user login form.

CREATE TABLE login (uid BIGINT, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY', 'STORED AS TEXTFILE

This means that the ip field and uid field of the login form are separated by the delimiter','.

Output data corresponding to hive table

# printf "% s printf% s\ n" 3105007001 192.168.1.1 > > login.txt # printf "% s department% s\ n" 3105007002 192.168.1.2 > > login.txt

The content of login.txt:

# cat login.txt 3105007001192.168.1.13105007002192.168.1.2

Load data into the hive table

LOAD DATA LOCAL INPATH'/ home/hadoop/login.txt' OVERWRITE INTO TABLE login PARTITION (dt='20130101')

View data

Select uid,ip from login where dt='20130101';3105007001 192.168.1.13105007002 192.168.1.2array

Suppose the landing form is

CREATE TABLE login_array (ip STRING, uid array) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITEDFIELDS TERMINATED BY', 'COLLECTION ITEMS TERMINATED BY' | 'STORED AS TEXTFILE

This means that the login table has multiple users per ip, with the ip and uid fields separated by', 'and the elements of the uid array separated by' |'.

Output data corresponding to hive table

# printf "% s printf% s |% s |% s\ n" 192.168.1.1 3105007010 3105007011 3105007012 > > login_array.txt# printf "% s% s |% s |% s\ n" 192.168.1.2 3105007020 3105007021 3105007022 > > login_array.txt

The content of login_array.txt:

Cat login_array.txt 192.168.1.1 3105007010 | 3105007011 | 3105007012192.168.1.2 3105007020 | 3105007021 | 3105007022

Load data into the hive table

LOAD DATA LOCAL INPATH'/ home/hadoop/login_array.txt' OVERWRITE INTO TABLE login_array PARTITION (dt='20130101')

View data

Select ip,uid from login_array where dt='20130101';192.168.1.1 [3105007010,3105007011,3105007012] 192.168.1.2 [3105007020,3105007021,3105007022]

Use array

Select ip,uid [0] from login_array where dt='20130101';-use subscript to access array select ip,size (uid) from login_array where dt='20130101'; # to view array length select ip from login_array where dt='20130101' where array_contains (uid,'3105007011'); # array lookup

For more operations, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions.

Map

Suppose the landing form is

CREATE TABLE login_map (ip STRING, uid STRING, gameinfo map) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITEDFIELDS TERMINATED BY', 'COLLECTION ITEMS TERMINATED BY' | 'MAP KEYS TERMINATED BY': 'STORED AS TEXTFILE

This means that each user will have game information on the login form, and the user has multiple game information. Key is the name of the game and value is the score of the game. Key and value in map are separated by':', and elements of map are separated by'|'.

Output data corresponding to hive table

# printf "% srecade% srecade% s |% svv% s |% svv% s\ n" 192.168.1.1 3105007010 wow 10 cf 1 qqgame 2 > > login_map.txt# printf "% sreco% sGV% s |% sRO% s |% sRO% s\ n" 192.168.1.2 3105007012 wow 20 cf 21 qqgame 22 > > login_map.txt "

The content of login_map.txt:

# cat login_map.txt192.168.1.1,3105007010,wow:10 | cf:1 | qqgame:2192.168.1.2,3105007012,wow:20 | cf:21 | qqgame:22

Load data into the hive table

LOAD DATA LOCAL INPATH'/ home/hadoop/login_map.txt' OVERWRITE INTO TABLE login_map PARTITION (dt='20130101')

View data

Select ip,uid,gameinfo from login_map where dt='20130101';192.168.1.1 3105007010 {"wow": 10, "cf": 1, "qqgame": 2} 192.168.1.2 3105007012 {"wow": 20, "cf": 21, "qqgame": 22}

Use map

Select ip,uid,gameinfo ['wow'] from login_map where dt='20130101';-use the subscript to access mapselect ip,uid,size (gameinfo) from login_map where dt='20130101'; # to view the map length select ip,uid from login_map where dt='20130101' where array_contains (map_keys (gameinfo),' wow'); # check map's key to find a record of playing wow games

For more operations, see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions.

Struct

Suppose the landing form is

CREATE TABLE login_struct (ip STRING, user struct) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITEDFIELDS TERMINATED BY', 'COLLECTION ITEMS TERMINATED BY' | 'MAP KEYS TERMINATED BY': 'STORED AS TEXTFILE

User is a struct that contains the user uid and the user name, respectively.

Output data corresponding to hive table

Printf "% s% s |% s |\ n" 192.168.1.1 3105007010 blue > > login_struct.txtprintf "% s% s |% s |\ n" 192.168.1.2 3105007012 ggjucheng > > login_struct.txt

The content of login_struct.txt:

# cat login_struct.txt192.168.1.1,3105007010 | blue192.168.1.2,3105007012 | ggjucheng

Load data into the hive table

LOAD DATA LOCAL INPATH'/ home/hadoop/login_struct.txt' OVERWRITE INTO TABLE login_struct PARTITION (dt='20130101')

View data

Select ip,user from login_struct where dt='20130101';192.168.1.1 {"uid": 3105007010, "name": "blue"} 192.168.1.2 {"uid": 3105007012, "name": "ggjucheng"}

Use struct

Select ip,user.uid,user.name from login_map where dt='20130101';union

Use less, don't talk about it for the time being

The above is all the content of the article "sample Analysis of hive Native and Composite data". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report