In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
One of the characteristics of SPL is that the data is orderly, and proper use of location can significantly improve performance. Let's start with a typical scenario and gradually master the skills of using location.
Quick query
Binary search of the sorted data can achieve higher performance, but some algorithms need to use the original order, so it seems that they should no longer be sorted. For example, the following case:
PerformanceRanking.txt has three fields, which are empID (salesperson number), dep (department name), and amount (sales volume). The document records the performance rankings of salespeople in various departments for the current quarter, which has been stored in reverse order by sales. Now, according to the designated salesman ID, it is necessary to calculate how much more sales he should increase in order to improve the performance ranking. If the employee is already No. 1, there is no need to increase sales.
This algorithm needs to use the sales of the salesperson with a higher ranking, minus the sales of the salesperson, that is, to calculate the relative position of the original data. Since the original order is used, it seems that there should be no more sorting, otherwise it is difficult to convert the two to each other, and other algorithms may use raw data. In this way of thinking, the script will be written like this:
AB1=file ("PerformanceRanking.txt") .import@t () / read data 2=A1.pselect (empID:10) / unused position, press empID query record sequence number 3=A1.calc (A2 Magi if (# > 1 amount,0 amount [- 1]-position)) / calculate the relative position in the original data
The above script does not sort the data, so it can not do binary search, and the performance is not high.
In fact, we can use the location to sort while keeping the original data, so as to improve the query performance. The script is as follows:
AB5=oPos=A1.psort (empID) / position 6=index=A1 (oPos) recorded in the original data after sorting / making sort data 7=oPos (index.pselect@b (empID: 10)) / binary search to obtain the sequence number 8=A1.calc (A7 Magi if (# > 1 Magi amount [- 1]-amount,0)) / calculate the relative position in the original data
A5: the function psort only gets the position recorded in the original data after sorting, and does not really sort the original data.
A6: use oPos to create a sorted data. Note that the original data is not affected at this time, and oPos can be used as a bridge between the sorted data index and the original data.
A7: do a binary search for the sorted data and return to the corresponding record serial number in the original data.
In order to verify the performance difference between the two algorithms before and after using the location, we can randomly take the salesman number as a parameter, use a loop to simulate a large number of visits, and execute the two algorithms respectively. As follows:
AB10=100000. (A1 (rand (A1.len ()) + 1) .empID) / manufacture 10, 000 empID11=now ()
12for A10=A1.pselect (empID:A12) 13
= A1.calc (B12 Magi if (# > 1 Magi amount [- 1]-amount,0)) 14=interval@ms (A11 Magi now ()) / does not use the location, time: 13552 Ms 15
16=now ()
17for A10=oPos (index.pselect@b (empID: A17)) 18
= A1.calc (B17 19=interval@ms if (# > 1 Magi amount [- 1]-amount,0)) / use location, time: 165ms
It can be seen that the performance is improved dozens of times after using the position. The amount of data in the example is small, and with the increase of the amount of data, the performance gap will widen sharply, because the time complexity of traversal search is linear, while the time complexity of binary search is logarithm.
Quick alignment
The function align can align the data in sequence, such as input condition: = pOrderList= [10250mag10247mag10248mag10248re1024910251], align the order details according to this list, and subtotal the amount of each order. The code is as follows:
A1=connect ("demo") .query@x ("select orderID,productID,price,quantity from orderDetail") 2=A1.align@a (pOrderList,orderID) .new (orderID,~.sum (price * quantity): subtotal)
However, the above writing method does not make use of the position, so the performance is not high. To improve performance, you can sort the sequence (manually create the index table), align it with dichotomy, and finally restore it to the original order, as follows:
A1=connect ("demo"). Query@x ("select orderID,productID,price,quantity from orderDetail") 2roomoPos = pOrderList.psort () 3roomindex = pOrderList (oPos) 4=A1.align@ab (index,orderID) .new (orderID,~.sum (price * quantity): total) 5=A4.inv (oPos)
A2-A3: create the index table manually.
A4: align the order schedule with the order list and work out the subtotal amount. Because the index table is orderly, it can be aligned by dichotomy, that is, the @ b option.
A5: adjust A4 to its original position, consistent with the order of pOrderList. The function inv adjusts the member according to the specified position. Here, the member is adjusted according to the original position, which is equivalent to restoring to the original position.
By simulating the large traffic test of the two algorithms before and after using the location, we can see that the performance has improved significantly:
AB8=now ()
9for A9=A1.align@a (A12 price*quantity ID) .new (orderID,~.sum (price*quantity): total) 10=interval@ms (A11 recording now ()) / does not use the position, takes 43456 milliseconds 11
12=now ()
13for A9=oPos=A16.psort () 14
= index=A16 (oPos) 15
= A1.align@ab (index,orderID) .new (orderID,~.sum (price*quantity): total) 16
= B18.inv (oPos) 17=interval@ms (A15 18=now now ()) / use location, takes 7313 milliseconds 18=now ()
19for A9=A1.align@a (A12 price*quantity ID) .new (orderID,~.sum (price*quantity): total) 20=interval@ms / does not use location and takes 43456 milliseconds to query ordered data in batches.
Sometimes you need to query the ordered data in batches, such as pOrderList= [10877, 10588, 10611, 10611, 11037, 10685]. Please count the total shipping charges of the orders that meet the list. The code can be written as follows:
AB1=connect ("demo"). Query@x ("select orderID,customerID,orderDate,shippingCharge from order order by orderID") 2pragpOrderList10887, 10588, 10611, 11037, 10685] / list parameter 3=A1.select (pOrderList.pos (OrderID)) .sum (shippingCharge) / does not use position, single code
Explanation: the function pos and select cooperate to realize batch query. Where the function pos returns the position of a value in the sequence, or null if the value is not in the sequence. The function select is used for query and returns the current record when the condition is not null or false.
However, the above code does not take advantage of the location, so the performance is not high.
It should be noted that the order records are orderly, so you can use dichotomy to obtain the qualified order position, and then use the position to take the record and calculate. The specific code is as follows:
AB5=A1. (orderID) .pos @ b (pOrderList)
6=A1 (A5) .sum (shippingCharge) / utilize location, single code
A1. (orderID) the orderID column can be obtained, and the pos@b can quickly obtain the member position by dichotomy for ordered data. A6 takes data by location.
By simulating the large traffic test of the two algorithms before and after using the location, we can see that the performance has improved significantly:
AB8=A1. (OrderID) / performance test preparation 9100000. (A1. (OrderID) .sort (rand ()) .to (rand (100)) / randomly generate 100000 lists 10
11=now ()
12for A9=A1.select (A12.pos (OrderID)) .sum (shippingCharge) 13=interval@ms (A11 recording now ()) / unused position, 85166 Ms 14
15=now ()
16for A9=A1. (OrderID) .pos @ b (A16) 17
= A1 (B16) .sum (shippingCharge) 18=interval@ms (A15 recording now ()) / use position, 3484 milliseconds
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.