In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The collections in SPL are ordered, and members can be referenced by serial numbers. Flexible use of serial numbers can make the operation more simple and efficient.
1 member access
Some functions in SPL can use ordinal numbers or ordinal sequences as arguments, and the simplest application is to access members directly with ordinal numbers, similar to arrays in general programming languages.
2=A1 (1) 3=A1 (3) 4 > A1 (2) = 45 > A1 (4) = 8
A2 and A3 get the members of the specified position from the sequence, the position sequence number starts from 1, and the result is as follows:
A4 and A5 modify a member of the sequence. In a step-by-step manner, you can see that the change in the sequence in A1 is as follows:
You can use the A.M (I) function to take the reciprocal or loop from behind, which provides an effective supplement to A (I).
A1] 2=A1.m (3) 3=A1.m (- 2) 4=A1.m@r (6) 5=A1.m@r (12) 6=A1.m (6)
A2 and A3 use the A.M () function to get the value of the specified sequence member from the sequence, where-2 represents the penultimate member. The code in A4 and A5 adds the @ r option. When getting a member, if the specified sequence number is out of bounds, the loop fetches the number. For example, after the member in the sequence number 12 loop A1 is twice, it is equivalent to getting the second member. The A2~A5 results are as follows:
In A6, the specified sequence number 6 exceeds the length of the sequence and does not use the @ r option, which returns a null value.
SPL also provides a set of functions for location lookups, all of which start with p, such as:
2=A1.pos (5) 3=A1.pmin () 4=A1.pmax () 5=A1.pselect (~% 5)
A2 finds the location sequence number of the specified member, and if there are multiple members of the same value, only the first sequence number is returned. A3 and A4 return the sequence numbers of the smallest and largest members, respectively. In A5, find the sequence number of the first member that meets the set condition, and here find the location of the multiple member of the first 5. After calculation, the A2~A5 results are as follows:
If the member cannot be found, the A.pos () function returns null, so you can use the A.pos () function to determine whether the member belongs to the collection.
A1 [3Jing 5jue 1JI Jol 7] 2=A1.pos (1) = null3=A1.pos (2)! = null
The calculation results of A2 and A3 are as follows:
2 subset access
Use the ordinal sequence as a parameter to access a subset of the collection, such as:
A1 [3JE5, 4, 4, 6] 2=A1 ([1]) 3=A1 ([3, 5]) 4=A1 ([4, 1, 1, 3, 1]) 5 > A1 ([1, 3, 28) 6 > A1 ([2, 4, 4, 3]) = 0
A2Magee A3 and A4 obtain subsets from the sequence respectively. After calculation, the results of A2Magee A3 and A4 are as follows:
A5 and A6 modify the members of the sequence, using ordinal series as parameters, modifying more than one member at a time. When you execute step by step, you can see that the sequence changes in A1 are as follows:
The A.M () function can also obtain a subset using sequence parameters:
A1 [3meme 5, 4, 6, 1] 2=A1.m ([1, recollection 1]) 3=A1.m@r ([1m 6, 12]) 4=A1.m@0 ([1m, 6, 3])
In the example, a negative number can be used in the parameter sequence to indicate the position of the reciprocal, or the @ r option can be added to indicate that the position is out of bounds. You can also use the @ 0 option, where if there is an out-of-bounds sequence number in the parameter sequence, the corresponding null value will not appear in the result. The results of A2Magol A3 and A4 are as follows:
If you add the @ an option to the location lookup function, all members that meet the criteria are found and the sequence is returned with their ordinal numbers:
A1 [3 3=A1.pmin@a 2 1 6=A1.pos@a 9] 2=A1.pos@a (2) 3=A1.pmin@a () 4=A1.pmax@a () 6=A1.pos@a (8)
Due to the addition of the @ an option, A2 returns the position of all 2 in sequence A1, A3 returns the sequence number of all members with the lowest value, A4 returns the sequence number of all members with the highest value, and A5 returns the sequence number of all members in multiples of 2. When using the @ an option, even if only one member is found, the sequence of the sequence number is returned, rather than the sequence number itself, such as all the locations where A6 looks for 8. The calculated results of A2~A6 are as follows:
When you need to use the A.pos () function to return the location of multiple members at the same time, you may need to add @ I options as needed, such as:
2=A1.pos@i ([2) 3=A1.pos@i ([3) 3=A1.pos@i ([3)) 4=A1.pos@i ([1) 5=A1.pos ([1)) 6=A1.pos ([1)) 7=A1.pos ([1)
When you use A.pos@i () to find members in a parameter sequence, you will do so one-way and sequentially, while using only A.pos () will simply determine whether sequence A contains each member of the parameter sequence. The A2~A7 results are as follows:
It can be seen that the results of A3 and A4 are empty, in which A3 can not find the third 1Magi A4 in the sequence. In other words, A.pos@i () only returns the incrementing sequence, and returns null if the result cannot be found.
A.pos@i () returns empty when a member cannot be found, but because of the order and repeatable members, it cannot be simply used to determine whether the subset is included. Generally, the intersection operation is used:
2=A1.pos@i ([1) = null3=A1.pos@i ([1)) = = null3=A1.pos@i ([1) 2]) = = null4=A1.pos ([1) 2]) = = null5=A1.pos ([1) 2)) = = null6 [1) 1 [2) 7] 7 [1 1 1, 2 2] 7] 8 = A6 A1 = A69 = A7 ^ A1 = A7
The A2~A4 results are as follows:
Among them, when using A.pos@i (B) to find and judge, if the result is not empty, it means that the members of B can be found in An in turn, and that A must contain B. However, if the search result is null, it only means that the members of B cannot be found in A, and it does not mean that A does not necessarily include B, such as in A3.
When searching and judging with A.pos (B), if the result is empty, it means that there must be some members in B that cannot be found in A, which means that A must not contain B. However, if the result of the search is not empty at this time, and if there are duplicate members in B, there is no guarantee that A contains B, as in the case of A5.
The results of A8 and A9 are as follows:
It is feasible to determine whether A contains B by B ^ A = = B. based on the results in A8 and A9, it can be determined that A1 contains A6, but A1 does not contain A7. When using this method, it should be noted that the Operand of the intersection operation cannot be reversed, otherwise the order of the members in the result of calculating A ^ B may be different from that of B, and it will not be possible to judge correctly.
3 positioning of cyclic function
Similar to the symbol ~, in the parameters of the loop function, you can use # to indicate the sequence number of the current member.
2=A1. (#) 3=A1. (# + ~) 4=A1.select (#% 3) 5=A1.group (int ((#-1) / 2))
A2 obtains the sequence composed of serial numbers, and A3 obtains the result sequence of each position member added to the serial number. A4 uses the A.select () function to select the second member out of every three in the sequence of A1, that is, the 2nd, 5th, 8th, … A member of the position and forms a sequence. A5 divides A1 into a group of 2 members. The A2~A5 results are as follows:
In the loop function, SPL also provides relative access to members with the [] symbol:
A1] 2=A1. (~ [0]) 3=A1. (~ [1]) 4=A1. ((~-~ [- 1]) / ~ [- 1]) 5=demo.query ("select * from STOCKRECORDS where STOCKID=000062") .sort (DATE) 6=A5. ((CLOSING-CLOSING [- 1]) / CLOSING [- 1]) 708=A5.max (if (CLOSING > CLOSING [- 1], A7 domains A7 cycles 1)
A2 is to take each member from the sequence, A3 takes out the next member in each position, and A4 calculates the growth rate of each member in the sequence compared with the previous member. The calculation results of A2Magol A3 and A4 are as follows:
A5 query the stock information of the specified number. A6 calculates the daily increase in the stock price, and A8 further calculates the maximum number of days in a row for the stock. The results of A6 and A8 are as follows:
You can also access a subset in a circular operation using ~ [aformab]:
A1 [1 2=A1. (~ [- 1 2=A1 1]) 3=A1. (~ [- 1 Japanese 1]. Avg ()) 4=A1. (~ [1 4=A1 ()) 5=A1. (~ [, 0]. Sum ()) 6=A1. (~ [0,] .sum ())
A2 lists the members of the sequence at each location before and after the three positions. A3 calculates the moving average for each location. A4 and A5 are also cumulative summation. A6 calculates the cumulative sum of the reverse, that is, the sum of the remaining members. The A2~A6 results are as follows:
4-bit access
We know that the symbol # in the loop function is used to represent the sequence number of the current member, in fact, it is a number, and it can participate in the operation like other numbers, especially the members that can be used as sequence numbers to access other sequences. Taking advantage of this feature, we can access other sequences in the calculation:
2=A1. (A1 (#)) 3=A1. (A1.m (#-1)) 4 [5Perry 4 (#-1)) 5=A1. (+ A4 (#)) 6=A1++A47=10. (if (#% 2) (#-1) / 2 (1), A4 (# / 2))
In loop evaluation, the # in the expression can be used to represent the current sequence number. After calculation, the results of A2 and A3, A5, and A7 are as follows:
When using multiple sequences of equal length, bit access can be used to achieve an effect similar to a record field:
A1 [Bray,Jacob,Michael,John] 2 [65 Bray,Jacob,Michael,John] 2 [68 name,A4 (#): rank) 3 [76, 82, 78, 88] 4=A1.ranks@z (A2 (#) + A3 (#))
A4 calculates the ranking of the total score, and achieves results according to the position when calculating the total score. A5 generates an ordered table of names and rankings, and also associates the data in the two sequences according to location. Results of A4 and A5 are as follows:
5 sequence alignment
Before using alignment access, you need to ensure that each sequence is arranged in the same order, but in practical application, the sequence may not always be like this. In this case, the alignment function A.align () can be used to reorder the sequence according to a benchmark sequence:
AB1=demo.query ("select * from EMPLOYEE") / employee table 2=demo.query ("select * from ATTENDANCE") .align (A1from EMPLOYEE EIDEMPLOYEEID) / attendance list aligns 3=demo.query by employee number ("select * from PERFORMANCE"). Align (A1veEIDOYEEID) / performance table aligns 4=A1.new by employee number (NAME,SALARY* (#) .ABSENCE + A3 (#): salaryPaid) / new sequence table to calculate salary A1 from GYMSCORE where EVENT='Vault' A2 5=demo.query ("select * from GYMSCORE where EVENT='Vault'") / vault score 6=demo.query ("select * from GYMSCORE where EVENT='Floor'"). Align (A5 from GYMSCORE where EVENT='Vault' name) / floor exercise score Calculate athletes according to 7=A5. (round (SCORE*0.6+A6 (#). SCORE* 0.4) / calculate weighted score 8=A7.ranks@z () / calculate weighted ranking 9=A5.new (NAME,A7 (#): score,A8 (#): rank) / new sequence table to calculate athletes, weighted scores and rankings
The sequence tables of A2 and A3 have been aligned according to the employee numbers in A1. A4 calculates the salary sequence of the employees as follows:
A6 aligns the data according to the names of the athletes in the A5 sequence table, and A7 calculates the weighted score accordingly. After A8 calculates the ranking of the weighted points, A9 collates the order of the results as follows:
In fact, an alignment function using the @ an option also returns a sequence aligned with the base sequence, except that each of its members is a collection, and alignment access can also be applied.
AB1=demo.query ("select * from EMPLOYEE") / employee Table 2 [California,Texas,Pennsylvania]
3=A1.align@a (A2 state) / state-aligned grouping 4=A3.new (A2 (#): STATE,~.count (): Count,round (~ .avg (age (BIRTHDAY)), 2): Age) / use # to find the field value in A2 according to the serial number in A3
After calculation, the results of A4 are as follows:
No option A.align () function will correspond to each member of the benchmark sequence, take the first member in the source sequence and then return the collection, instead of returning the collection. When it is clearly known in advance that there is only one member in each subset of the group, using the A.align () function is equivalent to completing a sorting operation according to the benchmark sequence.
Similarly, enumerated groups can also be accessed bit by bit, except that the @ 1 option in A.enum () is invalid and can only deal with grouping problems:
A1=demo.query ("select * from EMPLOYEE") 2 [AgeGroup1,AgeGroup2,AgeGroup3] 3 [? 35 & &? 40] 4=A1.enum (A3 BIRTHDAY (BIRTHDAY)) 5=A4.new (A2 (#): AgeInterval,~.count (): Count)
A5 calculates the total number of employees in the three age groups as follows:
6 interval series
A sequence is a special set, which is a set itself and can apply various set operations. at the same time, it can be used as a sequence number to access a subset of other sequences. flexible use of sequence is an important part of establishing sequence thinking, such as:
A1=to (10) 2=to (3) 3=A1.step (3) 4=20.step (4)
The to () function can get a sequence of consecutive integers, while the step () function can set parameters such as the interval between the members of the sequence. The A1~A4 results are as follows:
Using the position sequence of a subsequence in the original sequence can be used to process a subset, such as:
The sequence 2=A1 (100.step (14) 7) consisting of AB1=to (1 to 100) = 0 / from 14, the multiple of 7 is assigned to 03=A1.run (if (~ > 1 (100.step (~, ~ + ~)) = 0)
4=A1.select (~ > 1) / generating prime table: assign all positions of composite numbers to 0, leaving only primes 5 '100. (rand ()) / generate 100 random numbers 6=A5 (to (50))
7=A5 (to (51100))
8 > A5 (100.step (2jin1)) = A6
9 > A5 (100.step (2Power2)) = A7 / shuffle A5, that is, the first 50 and the last 50 members are arranged alternately.
In the above example, you can use a sequence to assign values to the original sequence, or you can get a subsequence, and so on.
7 serial number series
If you sort the sequence, you will lose the original order information of the members, but sometimes this information is still used. for example, we want to know the entry order of the three oldest employees in the whole company, and the increase of the three trading days with the highest stock price. No, no, no.
To do this, SPL provides the A.psort () function, which returns the sequence number of the sorted member before sorting.
A1 [c _ journal bpena _ r _ d] 2=A1.psort () 3=A1 (A2) 4=A1.sort () 5=A3==A4
The A2~A5 results are as follows:
Popularly speaking, in the sequence returned by A.psort (), the first number is the first member in the original sequence, and the second number is the second member in the original sequence.
If you use the sequence generated by the ordinal sequence, you can also use the A.inv () function to obtain the inverse sequence of the ordinal sequence for recovery, such as:
A1 [c _ 3=A1.psort _ a] 2=A1.sort () 3=A1.psort (). Inv () 4=A2 (A3) 5=A4==A1
The calculation results of A2~A5 are as follows:
With the functions A.psort () and A.inv (), you can easily solve the problem of keeping the original sequence number:
AB1=demo.query ("select * from EMPLOYEE") .sort (HIREDATE)
2=A1.psort (BIRTHDAY:-1) / returns A1 ordinal sequence 3=A2 (to (3)) / 4=demo.query ("select * from STOCKRECORDS where STOCKID=000062") .sort (DATE) of the smallest three employees in A1.
5=A4.psort (CLOSING:-1) / A4 the serial number 6=A5 (to (3)) after the closing price is sorted in descending order / the serial number 7=A6 (A4 (~) .CLOSING / A4.m@0 (~-1). CLOSING-1) / the increase of these three days recorded in A4. The order 8=A6 in A4 is used to calculate the increase. (A4.calc (~, (CLOSING-CLOSING [- 1]) / CLOSING [- 1])) / the expression in A7 can be abbreviated with calc function.
When searching data, using dichotomy can greatly improve the efficiency, but this method requires that the original sequence is ordered for the search keywords, and if the original sequence is out of order, it needs to be sorted first. If you are looking for the member itself, sorting first is not a problem, but when you want to find the serial number of the member, sorting will destroy this information, and then you need to use the A.psort () function, such as:
AB1=demo.query ("select * from EMPLOYEE") .sort (HIREDATE)
2=A1.psort (NAME) / A1 ordinal sequence 3=A1 (A2) / 4=A3.pselect@b sorted by name (NAME: "David") / search for David serial number 5=A2 (A4) / David in A1 by dichotomy
Using A.psort () here is equivalent to establishing a dichotomy lookup index for a sequence, and a sequence can build multiple lookup indexes according to different keywords at the same time.
The alignment grouping function can also return a sequence of ordinal numbers instead of directly returning the aligned sequence, such as:
AB1=demo.query ("select * from SALES") .sort (AMOUNT:-1) / order in descending order by amount 2 [QUICK,ERNSH,HANAR,SAVEA]
3=A1.align@1p (A2 famous client) / grouped according to the customer sequence in A2, return the serial number 4=A3.new (A2 (#): NAME,A1 (~). AMOUNT: Amount,~:Rank) / use the serial number in A3 to find the order amount and total amount in A1 to locate and calculate.
After calculating the sequence number of the required record, you can use the positioning calculation A.calc () to calculate the desired result. The use of positioning calculation can avoid unnecessary calculation, thus improving the efficiency of calculation.
A1=file ("VoteRecord") 2=A1.import@b () 3 [Califonia,Ohio,Illinois] 4=A2.pselect@a (A3.pos (State) > 0) 5=A2.calc (A4 voice [- 1]-Votes+1)
The calculation results in A2Magol A4 and A5 are as follows:
In this example, the binary file VoteRecord stores the results of a vote and has been sorted in descending order by the number of votes. A4 calculates the sequence of employee numbers for the specified state. A5 according to the numbering sequence, calculate how many more votes these employees need to get, and then the ranking can go up. For example, Ryan Williams, which is currently ranked third, needs to get another 69 votes to advance 1 place in the ranking. Cross-row processing is required in the calculation, which can not only be done based on the data of the selected employees, but also needs the relevant data in the original table.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.