In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to master SQL grammar Explode and Lateral View". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to master SQL grammar Explode and Lateral View".
Explode and lateral view
Why do you put these two together? because they are often used together.
Explode and lateral view should not appear in the relational database itself, because its appearance is to operate the data that does not meet the first paradigm (each attribute can no longer be separated), and has violated the design principle of the database (whether it is a business system or a data warehouse system). However, after the popularization of big data's technology, a lot of data similar to pv,uv is stored in non-relational databases in the business system. The probability of using json to store data is relatively large. If it is directly imported into the data warehouse system based on hive, you need to parse this kind of data through the ETL process. Explode and lateral view show their skills in this scenario.
Explode usage
Before we introduce how to handle it, let's take a look at Hive's built-in explode function. The official explanation is: explode () takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW. It means that explode () takes an array or map type of data as input, and then outputs the elements in array or map as each line. It can be used with LATERAL VIEW. Just looking at the text description is not very intuitive, let's take a look at a few examples.
Hive (default) > select explode (array); OK A B C Time taken: 4.188 seconds, Fetched: 3 row (s) hive (default) > select explode (map ('ajar, 1,' baked, 2, 'centering, 3)); OK key value a 1 b 2 c 3
The explode function takes an array or data of type map, and you usually need to use the split function to generate an array.
Explode fit parsing Json array
Here is the data:
{"info": [{"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "2345Explorer_embedupdate", "plugin": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}]}
Now we need to extract the AppName and pepper, and then store them in a row, one by one. First of all, let's try it according to the function of Json processing we learned in the previous section.
Select get_json_object ('{"info": [{"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "2345Explorer_embedupdate", "plugin": "- 1"}, {"AppName": "SogouExplorer_embedupdate") "pepper": "- 1"}]}', "$.info [*] .AppName")
As shown in the picture
Image-20201231111231311
But we noticed that although it has been extracted here, the return value is a string, ah, why do I know it is a string, but it looks like an array, because I have tried it with the explode function, so how to deal with it next? at this time, we need to cooperate with split processing. In order to facilitate the operation, I directly use the results of the above to operate.
["SogouExplorer_embedupdate", "2345Explorer_embedupdate", "SogouExplorer_embedupdate"]
But let's try to deal with the above string, first we need to split, but before that, we need to remove the square brackets on both sides, otherwise our data will contain these two symbols.
Select regexp_replace ('["SogouExplorer_embedupdate", "2345Explorer_embedupdate", "SogouExplorer_embedupdate"]', "[\ [\]]",'')
And then we can have split and explode.
Select explode (regexp_replace ('["SogouExplorer_embedupdate", "2345Explorer_embedupdate", "SogouExplorer_embedupdate"]', "[\ [\]]",'),')
Image-20201231112616809
To parse the json array, we essentially replace the square brackets with regexp_replace, and then split the data using the split function to split the explode into multiple rows. Is there a problem with the above writing? the function can be completed, but here only the AppName field is mentioned, and there is one field that has not been extracted. If you want to extract it, you have to repeat the above steps. Next, we try to introduce json_tuple to simplify our operation. We first explode it into multiple lines of simple json strings, and then use json_tuple to process it.
Select explode (split (regexp_replace (get_json_object ('{"info": [{"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate") "pepper": "- 1"}, {"AppName": "2345Explorer_embedupdate", "plugin": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}}', "$.info"),'[[\\]]','),'(},\\ {)','} #\\') '#'))
Regexp_replace is called twice here, the first is to remove the square brackets on both sides, and the second is to distinguish the comma in jons from the comma that divides json, because we split according to the delimiter between the contents of the array, so we can think of it as replacing the delimiter of the array string with a comma with a # sign, and then according to # split
Image-20201231122203730
Then you can call the json_tuple function.
Select json_tuple (data,'AppName','pepper') from (select explode (regexp_replace (regexp_replace ('{"info": [{"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}) {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}, {"AppName": "2345Explorer_embedupdate", "plugin": "- 1"}, {"AppName": "SogouExplorer_embedupdate", "pepper": "- 1"}}', "$.info"),'[\ [\]]' ''),'(},\\ {),'} #\\ {),'#') as data) json_table
As shown in the picture
Image-20201231122505355
So we parse out the fields we need.
Lateral view
Before we begin, let's talk about the use of LATERAL VIEW udtf (expression) tableAlias AS columnAlias. You can translate lateral view into a side view.
We have such a sample data (
Andy Lau actor, director, producer Bruce Lee actor, director, producer, behind the scenes, martial arts director Jet Li actor, martial arts director Liu Yifei actor
Here we want to convert it to the following format
Andy Lau actor Andy Lau producer Bruce Lee actor Bruce Lee producer Bruce Lee behind the scenes Bruce Lee martial arts director create table ods.ods_actor_data (username string, userrole string) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'; load data local inpath "/ Users/liuwenqiang/workspace/hive/lateral.data" overwrite into table ods.ods_actor_data
As shown in the picture
Image-20201231133130769
From our previous study, we know that the explode function should be used here.
Select explode (split (userrole,',')) from ods.ods_actor_data
Image-20201231134156444
In theory, all we have to do now is to choose username.
Select username,explode (split (userrole,',')) from ods.ods_actor_data
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
Because explode is a UDTF, you can't use it directly with other fields, so what should you do in
Select username,role from ods.ods_actor_data LATERAL VIEW explode (split (userrole,',')) tmpTable as role
As shown in the picture
Image-20201231154758339
It seems that our implementation is over at this point.
Lateral view outer
Why is there an extra OUTER keyword? in fact, you can also guess that outer join is a bit like it, just to avoid affecting the return of our main table when the return value of the explode function is null. Note that it is null rather than an empty string.
Select username,role from ods.ods_actor_data LATERAL VIEW explode (array ()) tmpTable as role
As shown in the picture
Image-20201231160414501
After adding the outer keyword
Select username,role from ods.ods_actor_data LATERAL VIEW outer explode (array ()) tmpTable as role
As shown in the picture
Image-20201231160459117
In fact, a SQL you can use lateral view multiple times is also possible, like this
SELECT * FROM exampleTable LATERAL VIEW explode (col1) myTable1 AS myCol1 LATERAL VIEW explode (myCol1) myTable2 AS myCol2
What is the implementation principle of lateral view?
First of all, we know that explode () is a UDTF, that is, an input, multiple outputs, or a row in and a column out (multiple lines).
Image-20201231162007648
The lateral view keyword is to give the expression of the specific field of each row to the explode function, then make a Cartesian product of the output and the current line, then repeat until all the data in the table is looped, and then become the following installation (the column passed to the explode field is omitted in the figure)
Image-20201231162254979
But in fact, here I have a question, why should be so designed, directly the ordinary field and the return value of the UDTF function query together is not good, and then the original field and the return value of UDTF to do Cartesian product on the line, ah, why also lateral view, .
The use of where in lateral view
You may say that where is not so useful, is there anything different? for example, as long as the information above is Andy Lau's, then you will definitely write the following SQL.
Select username,role from ods.ods_actor_data LATERAL VIEW explode (split (userrole,',')) tmpTable as role where username=' Andy Lau'
What if I just want the director, but we know that the userrole field is not directly directed by the director, but also includes the director's actor, director, producer, behind the scenes, martial arts director. In fact, you can use the following alias field role at this time.
Select username,role from ods.ods_actor_data LATERAL VIEW explode (split (userrole,',')) tmpTable as role where role= "Director"
As shown in the picture
Image-20201231165856030
Thank you for reading, the above is the content of "how to master SQL grammar Explode and Lateral View". After the study of this article, I believe you have a deeper understanding of how to master SQL grammar Explode and Lateral View, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 268
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.