How Hive can easily implement stored procedures 07/19 Update SLTechnology News&Howtos

How Hive can easily implement stored procedures

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In order to solve this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

The first is HPL/SQL. At present, this method is not perfect, for example, there are many restrictions on the use of cursors, many functions can not be realized, strict requirements for variables, and incompatible errors often occur. Code error reporting is not an obstacle as long as it is debuggable, but the problem with HPL/SQL is that it is not debuggable, which is very inconvenient for developers.

What is more inconvenient is that HPL/SQL lacks JDBC interface and cannot easily embed the JAVA program, so it can only call the command line in JAVA to execute HPL/SQL, and then HPL/SQL implements the calculation and writes the result back to the Hive temporary table. Finally, JAVA reads the temporary table through Hive's JDBC.

The second is the indirect implementation of UDF developed by JAVA. JAVA lacks structured computing class library, and all algorithms have to be hard-coded. For example, the most basic two-dimensional table needs to be implemented with ArrayList+HashMap combination, the simplest grouping summary has to write dozens of lines, and the association calculation is even more tedious. Because it is difficult to unify the rules in hard coding, even if the business logic is similar, the specific algorithms are very different, which leads to poor readability and maintenance of the code.

JAVA stored procedures also have the problem of high coupling. The JAVA class cannot be hot deployed, and the Hive service has to be recompiled and restarted with each change, which can have a serious impact on the production environment. If an ingenious structure is designed, the coupling may be reduced, but the project cost is bound to rise sharply.

If you use an aggregator, it is much easier to implement Hive stored procedures.

The aggregator has a rich structured class library, which can be directly implemented by built-in functions, no matter query, sorting, aggregation, grouping summary and associated query. The aggregator also provides branch judgment, circular statements and dynamic syntax for structured data, and complex business logic can also be easily implemented. The aggregator allows you to set breakpoints and trace debugging so that programmers can troubleshoot quickly. In terms of upward interface, the aggregator provides a standard JDBC driver for JAVA code to call, while the actual stored procedure exists in the form of script files. Modifying stored procedures does not affect JAVA code or Hive services. In terms of downward interfaces, the aggregator not only supports standard Hive JDBC, but also provides a higher-performance private interface, both of which can execute HSQL statements.

Example: the sales table in Hive is grouped by sales, year and month as follows:

Stored procedure algorithm: adjust the account of each sales Q1Q2, specifically transfer 1000 yuan from April to March. It is required to adjust the data for the same year of sales. If it is missing in March, you need to add an empty record of-1000 in March in order to balance the accounts. If it is missing in April, an additional 1000 of the empty record will be added in April. If both are missing, no adjustment will be made.

The calculation results should be as follows:

The aggregator stored procedure is as follows:

ABCD1=connect@l ("hiveDB")

/ connect to hive via jdbc2=A1.cursor@x ("select sellerid,year (orderdate) yjournal month (orderdate) m, sum (amount) amount from sales group by sellerid,year (orderdate), month (orderdate) order by sellerid,year (orderdate), month (orderdate)") / run HSQL3=A2.create ()

/ prepare a blank result4for A2; [sellerid,y]

/ batch for every year of every seller5

= A4.select (massively 3) = A4.select (massively 4) / reocrd of Mar. And Apr.6

If B5stores = [] & & C5legs = [] > B5.amount=B5.amount-1000/if both exist then modify batch7

> C5.amount=C5.amount+1000

eight

Else if B5mom = [] & & C5mom = [] > A3.record ([A4.selleridReceiving A4.yMagne3Mae Murray 1000]) / if Mar. Not exists then add new reocord to result9

> C5.amount=C5.amount+1000/modify batch10

Else if B5stores = [] & & C5stores = [] > B5.amount=B5.amount-1000/if Apr. Not exists then add new record to result11

> A3.record ([A4.selleridrect A4.yMagned1000]) / modify batch12

> A3.paste@i (A4. (sellerid), A4. (y), A4. (m), A4. (amount)) / union up this batch to result13return A3.sort (sellerid,y,m)

/ sort and return result's answer to the question about how Hive can easily implement stored procedures is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.