A shallow solution to SPL that is easier to use than SQL 07/19 Update SLTechnology News&Howtos

A shallow solution to SPL that is easier to use than SQL

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Basic query syntax migration from SQL to aggregator (1) single table operation

There is a powerful presence in the field of database and data analysis, the big name SQL, the full name structured query language (Structured Query Language). Literally, the goal of the language is to "query" the data, and the word query is not very difficult. But in fact, in order to support seemingly simple queries, a large number of calculation actions need to be carried out, even if the whole process is a multi-step calculation, there is still a strong dependency between the front and back steps, and the results of the previous calculation will be used later. The latter output may require us to adjust the previous calculation.

For example, this is a bit like going to all kinds of outlets to do business. After filling in the form and submitting it, the relevant staff begin to be busy behind the window, and from time to time they will throw back a few questions to ask for clarification, and when they finally get a reply, it may not be the result they expect at all! At this time, sitting outside the work window, while we are crazy, we often have an idea that if we can see, or even participate in the process, we should be able to greatly improve the efficiency of work.

Yes, as you can imagine, the aggregator introduced below is a computing (not just query) tool that can be easily played and controlled for us process controllers compared to SQL.

What we need to do is to "follow the cat and draw a tiger" to transfer the SQL operation in the habit to the aggregator, with a small habit change, in exchange for a big benefit improvement.

First, we need to "move" the data from the traditional data source to the aggregator so that subsequent operations can be done entirely in the aggregator.

Our most commonly used data source is generally the relational database RDB. The sample data used here are two tables in the database:

Order information table (order, key orderId), including order number orderId, customer code customerId, employee number employeeId, order date orderDate, delivery date sendDate and amount money****:

OrderIdcustomerIdemployeeIdorderDatesendDatemoney10248VINET22011-02-042011-02-1644010249TOMSP92011-02-052011-02-10186310250HANAR92011-02-082011-02-12181310251VICTE92011-02-082011-02-15670

Order schedule (orderDetail, key orderId,productId), including order number orderId, product number productId, price price, quantity amount, discount discount:

OrderIdproductIdpriceamountdiscount110591739120.85110596034350.911060603441110607713100.95

The process of "moving", or importing data, is very simple, as shown in the following figure:

Aggregator A1=connect ("hsqlDB") 2=A1.query ("select * from order") 3 > A1.close ()

First, establish the database connection (grid A1), then read the data from the database directly through the SQL statement of full query in a single table (grid A2), and finally clean up the site and close the database connection (grid A3).

After executing the script, we can select grid A2 and look at the moved data in the result area. at the same time, the order table has a different identity in the aggregator, which we call the "sequence table", represented by the grid name A2. The ordinal table is a very important concept in the aggregator, and now we can simply understand it as a table in the corresponding database:

In fact, in the aggregator, any grid with calculation results (usually starting with the equal sign "=") can select and view the results at any time after executing the script, and can be referenced at any time by grid names such as A7, B8, etc., so as to satisfy our desire to monitor at any time.

Next, let's take a look at how it works in the aggregator, taking each clause of the select statement in SQL as a clue:

First: SELECT clause

It is used to select the field to be queried, or to evaluate the field through an expression, or to rename the field. Correspondingly, there are three functions: new, derive and run in the aggregator.

For example, only select the order number, employee number, order date and amount fields in the order table, and multiply the amount by 100 to change its units from meta, and rename the corresponding result field to centMoney.

SQL does the following:

SQLSELECT orderId,employeeId,orderDate,money*100 centMoney FROM order

The corresponding practice of the aggregator is A3 in the following table:

A1=connect ("hsqlDB") 2=A1.query ("SELECT * FROM order") 3=A2.new (orderId,employeeId,orderDate,money*100:centMoney) 4=A3.derive (year (orderDate): orderYear) 5=A4.run (string (orderYear) + "year": orderYear) 6 > A1.close ()

A3 creates a new sequence table using the data from A2, containing the required fields, including multiplying the amount by 100 and naming it with centMoney:

Let's continue to look at the results of A4, adding a new field orderYear after the original field in the order table A3, which means that derive (newly calculated field) is equivalent to new (all old fields, newly calculated fields), which can be seen as a simple way to write the new function, eliminating the need to copy the old field again.

A5 uses the run function, which directly modifies the value of the old field orderYear, but if you look at the result of A4, it becomes the same as A5. This is because the run function does not generate a new ordinal object like the new and derive functions, but makes changes directly on the original object.

To sum up, in the aggregator, the new, derive, and run functions all produce ordinal results, but the new and derive functions generate a new ordinal object, such as copying the data (this behavior has a proper noun immutable), while run directly modifies the processed object (the behavior belongs to mutable).

[extended reading] there are two factors for proposing the behavior of mutable: the first is to reduce memory footprint to improve performance; second, some actual business requirements need to change the original object, a series of actions directly act on an object, and get the correct result in the last step, instead of getting a bunch of intermediate results and then doing the final merge action. Of course, immutable also has its applicable scenarios, and there is no difference between the two behaviors.

Second: WHERE clause

It is used to filter the data table by condition. Correspondingly, the aggregator filters the records of an ordered table through the select function. The effect is shown in the following figure:

For the previous example data, we want to query the order data for a specified period (during January 2012), and we can compare the practice of SQL and the aggregator (grid A3).

SQLSELECT * FROM orderWHERE orderDate > = '2012-01-01' AND orderDate=date ("2012-01-01") & & orderDateA1.close ()

It is important to note that there are two details in the aggregator expression: one is that the date function is used to convert the string to the date type, and the other is that AND/OR is written as & & / | | in the aggregator.

The results of A3 are as follows:

It seems to be consistent with A2 structure, but the amount of data is small. But we can do an experiment, enter "= A2.select (orderId=10555) .run (money*10:money)" in grid B3, modify the field values of some records in the A2 order table, and we can see that the values of these corresponding records in the A3 order table also change. This means that the records in the two ordinal tables are the same thing (called objects will be more professional), which means that the select function in the aggregator belongs to the mutable behavior we mentioned earlier.

The third: the GROUP BY clause

GROUPY BY often appears with aggregate functions such as SUM, COUNT, and so on, which are used to group query results according to certain fields and summarize statistics. Strictly speaking, these are two separate actions, but they always appear together in SQL, giving the illusion that both must be used at the same time. In fact, this combination is a limitation to the grouping operation, or after grouping, the computation that can be performed is much more than a few aggregate functions in SQL. In the aggregator, the corresponding GROUP BY clause is the group function, which can only group the data into groups, and each group forms a small set. In the later calculation, we can carry out simple aggregation or more complex calculations for these small sets. The following figure is an illustration of grouping summation using GROUP BY in SQL:

Again with the previous example data, we want to calculate the total sales of each employee during January 2012, which is grouped by employee number and summed. For the calculation of the sum of this group, let's compare the practice of SQL and aggregator:

SQLSELECT employeeId, sum (money) salesAmount FROM order WHERE orderDate > = '2012-01-01' AND orderDate=date ("2012-01-01") & & orderDateA1.close ()

The results of A4 are as follows:

The aggregator breaks down the query into three steps:

First, A2 takes out all the records in the order table

A3 then filters the order records for the specified period (January 2012)

Finally, A4 groups the filtered records according to the employee number (defined by the part before the semicolon in the function parameters, and multiple fields can be specified at the same time), sum each group (represented by the "~" symbol), and names the field of the calculation result salesAmount (the part after the semicolon).

There seems to be nothing wrong with grouping with SQL, only because our example only demonstrates the same grouping query as SQL. In fact, the second half of the group function in A4 is not necessary, and there may be businesses that just want to get the results after grouping, rather than doing summation and counting operations here; or there may be different aggregation operations for groups with specific values, so write more complex expressions for the small set "~" after grouping.

At the same time, in other tutorials, we can also learn that the grouping field is not limited to the original field, but can be an expression, just like in SQL.

Simple grouping belongs to the mutable behavior, which splits the records of a large set into multiple small sets. After adding the aggregation operation, because of the new data structure, it is no longer a simple operation to move the original records.

Fourth: HAVING clause

It is used to filter the aggregate results after GROUP BY. In the aggregator, there is no function specifically corresponding to HAVING, and the select function is used to filter any ordinal table according to conditions, because the calculation is step-by-step and the object to which the operation points is always clear. On the other hand, SQL requires that everything be expressed in one sentence. When faced with complex queries, it is inevitable to add concepts to operations of the same nature in order to express different objects. If you think about it, the concept of HAVING is not necessary in SQL, it is a simplified way to write the first-level subquery:

SELECT F1, sum (f2) f2 FROM t GROUP BY F1 HAVING sum (f2) > 100

Equivalent to

SELECT * FROM

(SELECT F1, sum (f2) f2sum FROM t GROUP BY F1) T2

WHERE f2sum > 100

Do similar simplification to more layer subqueries, and it is estimated that HAVING2 and HAVING3 will appear. Similar keywords, but after HAVING2 simplification price ratio is not high, SQL also does not provide. This shows another advantage of step-by-step computing. Only concepts that describe the essential needs of computing, such as HAVING and subquery, can be abandoned because of technical means. Reducing unnecessary concepts is an important means to reduce the cost of learning.

Let's take a specific look at the comparison between SQL and the aggregator to find the employee number with sales exceeding 5000 during the January 2012 period and his total sales:

SQLSELECT employeeId, SUM (money) salesAmount FROM order WHERE orderDate > = '2012-01-01' AND orderDate5000 aggregator A1=connect ("hsqlDB") 2=A1.query ("SELECT * FROM order") 3=A2.select (orderDate > = date ("2012-01-01") & & orderDate5000) 6 > A1.close ()

A5 result

With the gradual increase of query complexity, the aggregator statements are easy to read, and the clear step flow is highlighted. The results of each step can be observed, the calculation process can be controlled at will according to the results, and each calculation step can be described with the most concise concept. This is only the simplest example of a single-table query, and in the next article we will continue to see how aggregators perform better in the case of multi-table joins and unions.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.