In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Before we talk about sets, we need to talk about the concept of discreteness:
The so-called discreteness means that members of a set can exist outside the set and participate in operations, and free members can also form new sets. From the explanation of discreteness, we can know that discreteness is a kind of ability for set, and it is meaningless to talk about discreteness alone without the concept of set.
Discreteness is a very simple feature, almost all high-level languages that support structures (objects) naturally support it, for example, we can take array members out of Java to calculate separately, or we can form new arrays for set operation again (although Java almost does not provide set operation class library).
A popular analogy: suppose there is a box full of white balls, the discrete operation is equivalent to opening the box and taking out the balls one by one to paint different colors, then the color of each ball after the operation is different; and the operation for the whole collection is equivalent to transporting the box filled with a certain number of balls to a certain place, and all the balls in the box are transported to that place at the same time.
Returning to the programming direction, a centralized computing script language with good set operation class library and discrete reference mechanism has inherent advantages over traditional SQL language (limited to relational algebra), both in terms of thinking mode and execution efficiency.
I. Solve some practical problems with slightly more complicated logic.
For example, as mentioned earlier, calculate the proportion of stocks that have risen for at least four days in a row in stocks that have risen for at least three days in a row:
A more common way of thinking is to use the window function: Divide data by company name and sort by date (Order By), call LAG window function to do difference operation upward and record whether it is NULL according to whether it is negative, call LAG and LEAD window function to find out the segmentation point of rising trend and falling trend and record 1, then call SUM window function to accumulate the preset value of segmentation point to become the basis field of segmentation, and then clear the invalid rows marked with NULL before, and then calculate the number of>=3 and>=4 respectively. Finally, calculate a ratio.
The specific implementation code is as follows (take SqlServer database as an example below):
WITH T1 AS
(
SELECT T.COM COM, T.STA STA, SUM(T.FLG) OVER(PARTITION BY T.COM ORDER BY T.DAT) GRP
FROM (
SELECT [Company] COM, [Date] DAT, [Price] PRI,
CASE WHEN [Price] > LAG([Price],1,0) OVER(PARTITION BY [Company] ORDER BY [Date])
THEN 1 ELSE NULL END STA,
CASE WHEN [Price]
< LAG([Price],1,0) OVER(PARTITION BY [Company] ORDER BY [Date]) AND [Price] < LEAD([Price],1,9999999) OVER(PARTITION BY [Company] ORDER BY [Date]) THEN 1 ELSE 0 END FLG FROM Stock ) T ), T2 AS ( SELECT T1.COM COM, T1.GRP GRP, COUNT(T1.COM) CNT FROM T1 WHERE T1.STA IS NOT NULL GROUP BY T1.COM, T1.GRP ), T3 AS ( SELECT COUNT(T2.COM) Up3Days FROM T2 WHERE T2.CNT >= 3
),
T4 AS
(
SELECT COUNT(T2.COM) Up4Days FROM T2 WHERE T2.CNT >= 4
)
SELECT CONVERT(FLOAT,T4.Up4Days,120)/CONVERT(FLOAT,T3.Up3Days,120) FROM T3 JOIN T4 ON 1=1
It can be seen that in the process of data processing, this method adds classification definition and processing to the data, which is too troublesome: in addition to several layers of nested sub-queries, it also has to add filtering and segmentation tags, and also has to think about how to form segmented fields with segmentation tags, and how not to waste time by querying the same table repeatedly... So is there a more flexible method? Perhaps, for example, SqlServer can also consider using cursors and other methods (although flexible, but the amount of code is afraid of more…feel T-SQL is infinitely close to Java)
CREATE TABLE #RT(Company VARCHAR(20) PRIMARY KEY NOT NULL, Price DECIMAL NOT NULL, Record INT NULL, Most INT NULL)
CREATE TABLE #TT(Company VARCHAR(20) NOT NULL, Price DECIMAL NOT NULL, DT DATE NOT NULL)
CREATE CLUSTERED INDEX IDX_#TT ON #TT(Company,DT) -SQLSVR2016 Need to create index otherwise sort invalid
INSERT INTO #TT SELECT [Company], [Price], [Date] FROM Stock ORDER BY [Company],[Date]
DECLARE @Company VARCHAR(20), @Price DECIMAL, @Record INT, @Most INT
SET @Price=0 -The Price field needs an initial value of 0
DECLARE iCursor CURSOR FOR SELECT Company, Price FROM #TT -Define cursors
OPEN iCursor -Open cursor
FETCH NEXT FROM iCursor INTO @Company, @Price
WHILE @@FETCH_STATUS=0 -Enter loop if cursor fetch succeeds
BEGIN
IF((SELECT COUNT(*) FROM #RT WHERE Company=@Company)=0)
BEGIN INSERT INTO #RT VALUES(@Company, @Price, 1, 1) END
ELSE
BEGIN
IF((SELECT TOP 1 Price FROM #RT WHERE Company=@Company)=3 AND @Most=3),
T2 AS (SELECT COUNT(*) Num FROM #RT WHERE #RT.Most>=4)
SELECT CONVERT(FLOAT,T2.Num,120)/CONVERT(FLOAT,T1.Num,120) FROM T1 JOIN T2 ON 1=1 -Calculate final result
DROP TABLE #RT
DROP TABLE #TT
And this writing is basically not universal, which means that if you change the database, you may need to study the method of using cursors in another database.
Let's look at the code that the aggregator needs to solve similar problems (Excel for convenience):
A1=file("E:/Stock.xlsx").xlsimport@t().sort(Date).group(Company)2=A1. ((a=0,~.max(a=if(Price>Price[-1],a+1,0))))3=string(A2.count(~>=4)/A2.count(~>=3),"0.00%")
To achieve the same goal, in contrast, the concentrator code is not only simple, efficient, but also widely adaptable, and even if you need to do special parallel computing for large amounts of data, you will not be helpless.
II. Convenience of concentrator in processing database data
Since database SQL language programming is subject to so many restrictions and writing is so troublesome, then the data stored in the database cannot be rectified?
Of course not. After all, we still have the calculator. Here's a simple calculation:
How to sum a field cyclically, exit the loop when a value (80) is satisfied, and get the corresponding value of each field in the last loop
The script for SqlServer is as follows:
with cte as (
select *,cnt3 sumcnt from Tb where cnt1=1
union all
select Tb.*, sumcnt+Tb.cnt3 from Tb join cte on 1+cte.cnt1=Tb.cnt1 where sumcnt+Tb.cnt3c, where b is used to assign an initial value to the result variable ~~, a is that the parameter expression is evaluated and assigned ~~ every time the loop is repeated, and c is a Boolean expression that ends the loop early when the expression is true. (Note: exit loop if true)
To put it bluntly, iterate(a,b,c) is equivalent to the following pseudocode simulation with a while loop (note that ~ and ~~ are variables, and a, b, c are expressions):
i = 0;
~~ = b;
while (i
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.