In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how the Hive Json data storage format is, the article is very detailed, has a certain reference value, interested friends must read it!
The data is stored in the form of json, one json data per line.
What if
{"field1": "data1", "field2": 100,100, "field3": "more data1", "field4": 123.001} {"field1": "data2", "field2": 200," field3 ":" more data2 "," field4 ": 123.002} {" field1 ":" data3 "," field2 ": 123.002," field3 ":" more data3 "," field4 ": 123.003} {" field1 ":" data4 "," field2 ": 400," field3": "more data4", "field4": 123.004}
Form, but cannot be formatted!
Download the corresponding version of hive-hcatalog-core.jar.
Add to hive
ADD JAR / usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar
Create a json table
CREATE TABLE json_table (a string, b bigint)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
Prepare data
{"a": "k", "b": 1}
{"a": "l", "b": 2}
Load, come in.
Load data local inpath'/ home/hadoop/json.txt' into table json_table
If the load data does not meet the format requirements, such as not json, or other questions, do not prompt.
When using a table, a problem is prompted.
Complex json processing, like this
When used, where conflict ["xxx"] = yyy is for map. Other array and struct can be used by referring to hive documentation.
Now there is a new idea, if there is a file on hdfs, what to do if it is not in the desired json format, and how to read it through jsonSerde?
For example:
{
"es": "1459442280603 es 0 gfhgfh,1511240411010000754,\ n"
"hd": {
"a": "90014A0507091BC4"
"b": "19"
"c": "74:04:2b:da:00:97"
}
}
The whole json is normal, but the es part is a string. I want to turn the es part into a json object array or something, but I can't change the structure of the data on the original hdfs. After all, many mr programs have been written, and the changes are huge.
The obvious answer is to customize a JsonSerDe and modify part of the source code.
On github. Https://github.com/rcongiu/Hive-JSON-Serde is very good, you can download it, modify the code, and recompile it. The code I modified is the deserialize method of org.openx.data.jsonserde.JsonSerDe. From the name, you can tell that the change method is parsing the data read from hdfs, and the parameter is a writable.
Get the es code, reparse, generate the heart's json object, and finally put it into the total json object.
In this way, we can use this es1 property when we create the hive table.
It is worth noting that:
Es is parsed to [[xx,yy,zz], [xx1,yy1,zz1]], then the definition of hive is as follows:
CREATE external TABLE jsontest (es string
Es1 array
Hd map)
And what I did at first was:
Es is parsed into [{"name1": xx, "name2": yy, "name3": zz}, {"name1": xx1, "name2": yy1, "name3": zz1}]
The hive table is then defined as:
CREATE external TABLE jsontest (es string
Es1 array
Hd map)
It's always a problem, and there's a real problem. I didn't react to it. Let's use struct. After all, name can be specified in the hive table, not written dead in the code.
Package compilation:
Mvn-Dcdh.version=1.3.1 package-Dmaven.test.skip=true
To do so, although irregular characters can be parsed into regular characters, and through the hive data result mapping, but there is a problem, es1 is an array, if I want to let es1 a struct object in where to judge to use, but the size of es1 is not fixed, and I do not know which element of the array can be used to determine, therefore, the above method has drawbacks.
New method:
Array is used in events1, but string is used instead of struct
CREATE external TABLE test.nginx_logs2 (events string
Events1 array
Header map)
Partitioned by (datepart string,app_token string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
At this time, the source code will be spelled into regular json characters on the line.
Then use methods such as hive explore to expand the array of events1, and then use get_json_obj to get an attribute in the json string. Such as the following.
SELECT event_name
Count (DISTINCT user_id) AS num
FROM (SELECT header ["user_id"] AS user_id, get_json_object (event,'$.name') AS event_name
FROM test.nginx_logs2 LATERAL VIEW explode (events1) events1 AS event
WHERE get_json_object (event,'$.name') = 'xxx'
AND get_json_object (event,'$.type') ='0') f
GROUP BY event_name
The above is all the contents of the article "what is the Hive Json data storage format?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.