In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
In the development of database applications, we often have to deal with complex SQL computing, and fixed grouping is one of them. The grouping basis of a fixed packet is not in the data to be grouped, but comes from outside, such as another table, external parameters, condition list, and so on. For certain types of fixed packets, it is relatively simple to implement them with SQL (for example, grouping is based on another table and there is no requirement for grouping order), but for more general and flexible requirements, it is difficult to implement.
For SPL, all kinds of problems in fixed groups can be solved easily. Here are a few examples to illustrate.
The table sales stores the order record, where the client column is the customer name and the AMOUNT column is the order amount. Some of the data in table sales are as follows:
OrderIDClientSellerIdOrderDateAmount10248VINET52013/7/4244010249TOMSP62013/7/51863.410250HANAR42013/7/8181310251VICTE32013/7/8670.810252SUPRD42013/7/9373010253HANAR32013/7/101444.810254CHOPS52013/7/11625.210255RICSU92013/7/122490.510256WELLI32013/7/15517.8
It is required that the sales be grouped according to the "potential customer list" and the AMOUNT columns of each group are summed up. The "potential customer" here is a fixed group, which may come from different external conditions:
Case 1: the potential customer list comes from the Std field of another table, potential, with only four records: ANATR, BERGS, LACOR, ZTOZ, and the customer ZTOZ is not in the sales table. When outputting the results, it is required to group and summarize according to the above record order.
If we do not require the order of grouping, then SQL can easily implement this case:
Select potential.std as client, sum (sales.amount) as amount from potential left join client on potential.std=sales.client group by potential.std.
But if you are grouped in a specific order, as required in this case, you must create a field for sorting if implemented with SQL, and finally remove it with a subquery. It is much easier to implement it with SPL, with the following code:
A1=sales=db.query ("select * from sales") 2=potential=db.query ("select * from potential") 3=sales.align@a (potential:STD,CLIENT) 4=A3.new (potential (#) .STD: CLIENT,~.sum (AMOUNT): AMOUNT)
A1, B1: retrieve data from the database, named sales and potential, respectively, as follows:
A3:=sales.align@a (potential:STD,CLIENT)
This code uses the function align, which divides the Client field of sales into four groups according to the order of the Std field of potentail, as follows:
As you can see, the first three groups are already data in sales, while the fourth group is not in sales, so it is null. In addition, the parameter option @ an of the function align indicates that all the data in the group is fetched, and if this function option is not used, only the first item of each group is taken.
A4:=A3.new (potential (#) .STD: CLIENT,~.sum (AMOUNT): AMOUNT)
This code uses the function new to generate a new ordinal table, one of which is potential.STD, the Std field of potential, and the other is ~ .sum (AMOUNT), which is the result of summing the Amount fields of each set of data in A3. The final results are as follows:
Case 2: the list of potential customers is fixed, but the number of customers is large.
If the number of customers is small, you can use union statement to spell all customers into a fake table when using SQL. If there are a large number of customers, you don't have to do so. You must create a new table to keep the data permanently. However, the implementation with SPL saves the trouble of building tables. The code is as follows:
A1=sales=db.query ("select * from sales") 2 potentials = ["ALFKI", "BSBEV", "FAMIA", "GALED", "HUNGC", "KOENE", "LACOR", "NORTS", "QUICK", "SANTG", "THEBI", "VINET", "WOLZA"] 3=sales.align@a (potential,CLIENT) 4=A3.new (potential (#): CLIENT,~.sum (AMOUNT): AMOUNT)
In the above code, A2 is a sequence of strings and is named potential. A3 and A4 can access potential as in case 1, referencing their members directly.
Case 3: the potential customer list is an external parameter, such as "BSBEV", "FAMIA", "GALED".
External parameters often change, and it is even more inconvenient to use union to make fake tables in SQL. Only a temporary table can be created, and the parameters will be parsed and inserted into the temporary table one by one, and then the subsequent calculation will be carried out. However, it is not necessary to establish a temporary table for SPL implementation. The specific implementation process is as follows:
First, define a parameter clients, as follows:
Then modify the script file as follows:
A1=sales=db.query (select * from sales) 2=potential=clients.array () 3=sales.align@a (potential,CLIENT) 4=A3.new (potential (#): CLIENT,~.sum (AMOUNT): AMOUNT)
Run the script and enter the parameter values, assuming the parameter values are "BSBEV", "FAMIA", "GALED", as follows:
According to the input, the final calculation result is also different. The corresponding results of the input above are as follows:
Case 4: the grouping of fixed groups can be based on numerical values or conditions. for example, the order amount is divided into four intervals according to 1000, 2000 and 4000, with a group of orders in each interval, and the total amount of orders in each group is counted.
If the conditions are known, they can be written in SQL. If the conditions are dynamic external parameters, the SQL needs to be cobbled together in high-level languages such as JAVA, and the process is very complicated. Because SPL supports dynamic expressions, this case can be easily implemented as follows:
A1=sales=db.query ("select * from sales") 2 AMOUNT = ["? 1000 & 2000 & & 4000"] 3=sales.enum (byFac,AMOUNT) 4=A18.new (byFac (#): byFac,~.sum (AMOUNT): AMOUNT)
In the above code, the variable byFac is the grouping basis for this case and contains four conditions. The byFac can also be an external parameter or come from a view or table in the database. The final result in A4 is as follows:
Case 5:
In the above condition grouping, none of the conditions happen to overlap, but overlap is also common in practice, such as grouping order amounts according to the following rules:
1000 to 4000: regular order R14
Below 2000: non-priority order R2
3000 or above: key order R3
At this point, both R2 and R3 will overlap with R14. When the conditions overlap, we sometimes want the data not to overlap, that is, we first take out the data that meets R14, then filter out R2 from the rest of the data, and so on.
SPL's function enum supports conditional grouping of data overlap, as follows:
A1=sales=db.query ("select * from sales") 2 roombyFacs = ["? > = 1000 & &?
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.