Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use lateral view in hive

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use lateral view in hive, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

General situation

Lateral view is used in conjunction with the UDTF function, and UDTF produces 0 or more output lines for each input line. Lateral view first applies UDTF to each input row of the base table, and then joins the resulting output row with the input row to form a virtual table with a specified table alias.

Test SQLexplain SELECT id, sq,myCol from window_test_table LATERAL VIEW explode (split (sq,',')) myTab as myCol

This sql has gone through two lines:

Ts (TableScan)-- > lvf (Lateral View Forward)-- > sel (Select)-- > lvj (Lateral View Join)-- > sel (Select) ts (TableScan)-- > lvf (Lateral View Forward)-- > sel (Select)-> udtf-- > lvj (Lateral View Join)-> sel (Select)

1 、 TableScanOperator

Needless to say, regular meter reading operation

2. LateralViewForwardOperator@Overridepublic void process (Object row, int tag) throws HiveException {forward (row, inputObjInspectors [tag]);}

Almost nothing, how to get the data, but also how to send it out.

2-1, left SelectOperator

Filter out the non-explode columns you need: id,sq

2-2-1, right SelectOperator

Filter out the column of explode: split (sq,',')

2-2-2, right UDTFOperator@Overridepublic void process (Object row, int tag) throws HiveException {StructObjectInspector soi = (StructObjectInspector) inputObjInspectors [tag]; List list = listOI.getList (o [0]); if (list = = null) {return; / / do not send data} for (Object r: list) {forwardListObj [0] = r; forward (forwardListObj);} break; case MAP: / / process map MapObjectInspector mapOI = (MapObjectInspector) inputOI when there is no value in the array Map map = mapOI.getMap (o [0]); if (map = = null) {return;} for (Entry r: map.entrySet ()) {forwardMapObj [0] = r.getKey (); forwardMapObj [1] = r.getValue (); forward (forwardMapObj);} break; default: throw new TaskExecutionException ("explode () can only operate on an array or a map");}}....} Why is there the outer keyword?

When UDTF does not generate any lines, such as the input column of the explode () function is empty, LATERALVIEW does not generate any output lines. In this case, the original row will never appear in the result. OUTRE can be used to prevent this situation, and the column from UDTF in the output line will be set to NULL.

For example:

In fact, you can see from the code:

UDTF counts the results expanded for it with the help of UDTFCollector and forward:

@ Overridepublic void collect (Object input) throws HiveException {op.forwardUDTFOutput (input); counter++;}

If there is no expansion result, counter is 0. In this way, after entering outer, the previously built outerObj without content will be given to forward to the next operator LateralViewJoinOperator.

3. LateralViewJoinOperator@Overridepublic void process (Object row, int tag) throws HiveException {StructObjectInspector soi = (StructObjectInspector) inputObjInspectors [tag]; / / logo is if (tag = = SELECT_TAG) {selectObjs.clear (); selectObjs.addAll (soi.getStructFieldsDataAsList (row));} else if (tag = = UDTF_TAG) {/ / represents acc.clear () from udtf on the right; acc.addAll (selectObjs); acc.addAll (soi.getStructFieldsDataAsList (row)) / / merge data forward (acc, outputObjInspector);} else {throw new HiveException ("Invalid tag");}}

The LateralViewJoinOperator processing logic is also very simple and clear, and the join here is also a simple List.addAll.

Is there a shuffle?

Will Lateral view explode produce shuffle?

Of course not, no doubt! In fact, when you look at the implementation plan at the beginning, you will find that there is no reduce task.

The Join here represents the meaning of joining two pieces of data together, not the real join.

Thank you for reading this article carefully. I hope the article "how to use lateral view in hive" shared by the editor will be helpful to everyone. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report