In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Summary of problems
often encounters the problem of transforming the rows and rows of data tables in its daily work. SQL provides static transpose functions pivot and unpivot, but the scope of application is very limited. To achieve some complex transpose functions with SQL, we often encounter the problem of too complex sentences, and lack of a standard solution. The SPL language of the aggregator can perfectly make up for the deficiency of SQL in this aspect because of its flexible syntax and rich function library.
The realization of the transpose function is described in detail through an example below .
Basic part 1, row transfer column 1, pivot of the database
pivot is not a feature that has existed from the beginning. It is only supported by newer versions of products from major large database vendors, such as Oralce 11g or SqlServer2005.
As can guess from the name, this function is to convert rows to columns, that is, to use the values in the rows as column names. However, the rows and columns of the database are different from ordinary tables, and it is impossible to directly correct the X axis and Y axis. The reason is that the column of the database is unique (that is, the column name cannot be repeated), while the dynamic data is stored in the row, which can be repeated if not as the primary key. Therefore, the practical application of pivot basically follows after the grouping aggregation operation, after each row of data in the column (usually the dimension) used for transposition is processed into a non-repeated value, and then the value of each row is expanded as the column name.
from a specific application point of view, the role of pivot is to subdivide the aggregate results of a column into more specific column aggregation results, in order to achieve a more intuitive visual effect.
Is the concept of light relatively boring and not easy to understand? Let's illustrate with a specific example, such as a student report form (StudentScore) in the Oracle database:
If wants to count the highest score of each subject in each class, the traditional practice is:
Select CLASS,SUBJECT, max (SCORE) THE_HIGHEST_SCORE from STUDENTSCORE group by CLASS,SUBJECT select CLASS,SUBJECT, max (SCORE) THE_HIGHEST_SCORE from STUDENTSCORE group by CLASS,SUBJECT
The above results of can be said to be very bad: first, in the "class" column, the repetition of Class one and Class two makes it easy for people to misread the line; second, in the "subjects" column, Chinese, math and English are all put together, but in fact, the highest scores of these three subjects do not have any comparative significance.
in fact, we should prefer to see the following results:
In the result, the three subjects in the column of subjects are separated as a separate column, which not only reduces useless repetition, but also defines the mutual independence of the highest scores of each subject, which seems to be much clearer.
can say that pivot was born for this purpose. In order to achieve the above results, the query is now written as follows:
Select * from (select CLASS, SUBJECT, SCORE from STUDENTSCORE) pivot (max (SCORE) for SUBJECT in ('Maths' as MAX_MATHS,' English' as MAX_ENGLISH, 'Chinese' as MAX_CHINSES)) select * from (select CLASS, SUBJECT, SCORE from STUDENTSCORE) pivot (max (SCORE) for SUBJECT in (' Maths' as MAX_MATHS, 'English' as MAX_ENGLISH,' Chinese' as MAX_CHINSES)) 2, pivot of the aggregator
Some students in may ask, since there is already pivot in the database, why do I still need the pivot of the aggregator?
's answer is: first of all, not all databases provide pivot;. Secondly, even if all databases provide pivot, what if you want to have another pivot after summarizing the data from multiple databases? That still requires the pivot of the aggregator.
, let's take a look at how to use pivot of the aggregator.
A1=connect ("orcl") 2=A1.query ("select CLASS,SUBJECT,max (SCORE) THE_HIGHEST_SCORE from STUDENTSCORE group by CLASS,SUBJECT") 3=A2.pivot (CLASS;SUBJECT,THE_HIGHEST_SCORE; "Maths": "MAX_MATHS", "English": "MAX_ENGLISH", "Chinese": "MAX_CHINESE")
code description:
A1: the first step to connect to the database
A2: the second step is to extract data for preprocessing (this step can be further expanded to do complex calculations such as aggregation or aggregation. For specific methods, please refer to the relevant article)
A3: the third step is to implement the column conversion function of pivot and present it, and its effect is exactly the same as that of Oracle's pivot.
3. Other meanings of pivot
In addition to the data rendering requirements, can also use the calculation method between columns after converting rows into columns. Because the properties of columns are different from those of rows, some inter-column calculations are cumbersome to implement between rows. For example, when the school evaluates the class performance, the weights of math, foreign language and language are 0.6, 0.3 and 0.1 respectively, and the evaluation index is calculated by the average score of the three subjects of the two classes:
A1=connect ("orcl") 2=A1.query ("select CLASS,SUBJECT,avg (SCORE) AVG_SCORE from STUDENTSCORE group by CLASS,SUBJECT") 3=A2.pivot (CLASS;SUBJECT,AVG_SCORE; "Maths", "English", "Chinese") 4=A3.new (CLASS, (Maths * 0.6 + English * 0.3 + Chinese * 0.1): Assessment)
calculation result:
above calculation, if you want to achieve between the lines, it will be a lot of trouble, interested students can try for themselves.
2. Column wrapping 1. Unpivot of database
If there is a row-to-column in , it is natural for columns to change rows. Or take Oracle as an example, which provides a column wrap function called unpivot.
What is the business significance of the column wrapping function? Let's take a look at a personal report card (PersonalScore):
If wants to know which subject everyone is best at (that is, which subject each person scores the highest), it is convenient to use the max function for interrow calculation, while it is relatively cumbersome to use intercolumn calculation. Then the unpivot function comes in handy:
With T1 as (select * from PERSONALSCORE unpivot (SCORE for SUBJECT in (MATHS,ENGLISH, CHINESE)), T2 as (select NAME NAME, max (SCORE) The_Highest_Score from T1 group by NAME) select T1.NAME NAME, T1.SUBJECT Good_Subject, T2.The_Highest_Score Good_Score_Score from T1 join T2 on T1.NAME = T2.NAME and T1.SCORE = T2.The_Highest_Score with T1 as (select * from PERSONALSCORE unpivot (SCORE for SUBJECT in (MATHS,ENGLISH, CHINESE) T2 as (select NAME NAME, max (SCORE) The_Highest_Score from T1 group by NAME) select T1.NAME NAME, T1.SUBJECT Good_Subject, T2.The_Highest_Score Good_Score_Score from T1 join T2 on T1.NAME = T2.NAME and T1.SCORE = T2.The_Highest_Score2, pivot@r of the aggregator
, what if the database you are using is not Oracle? Do you still need to study the transpose syntax details of the new database? What if the database does not support transpose statements? Do you need to use case when or subqueries to achieve similar functions indirectly?
doesn't have to be so upset! Because we have an aggregator:
A1=connect ("orcl") 2=A1.query ("select * from PERSONALSCORE") 3=A2.pivot@r (NAME; SUBJECT, SCORE; MATHS: "MATHS", ENGLISH: "ENGLISH", CHINESE: "CHINESE") 4=A3.group (NAME). (~ .top @ 1 (- 1; SCORE)) 5=A4.new (NAME,SUBJECT:Good_Subject,SCORE:Good_Subject_Score)
The result of calculation is the same (there may be a slight difference in sorting):
another point to note: the unpivot of the database is not exactly the inverse operation of pivot, because aggregate functions are often included in pivot statements, and the aggregate calculation itself is irreversible, that is, unpivot cannot restore the aggregated results of pivot back to the original detailed data. However, because the pivot of the aggregator does not participate in the aggregate calculation (the aggregate computing has been performed separately before the pivot is executed), the pivot@r of the aggregator can be said to be the inverse of the pivot operation of the aggregator.
Advanced part 1, two-way transposition
sometimes requires more complex transpose operations, such as having a student transcript (Score).
and we want to get a student score sheet with a structure similar to the following (meaning to look at the changing trend of a student's performance in a subject):
NAMESUBJECTTERM 1TERM 2ZhangsanMATHS9987
here, first of all, to merge math, Chinese and other columns into subject columns, need the operation of column switching; and to split the semester column into semester one, semester two columns, need the operation of row conversion.
takes into account that the structure of a data table is that the number of rows is much greater than the number of columns, so we can advance rows and columns, and then do row-to-column. Because the raw data of this table can correspond one to one after column conversion to the data in the table before transformation (no need to calculate aggregation), it is obviously more convenient to use the pivot@r and pivot functions of the aggregator.
A1=connect ("orcl") 2=A1.query ("select * from SCORE") 3=A2.pivot@r (NAME,TERM;SUBJECT,SCORE) 4=A3.pivot (NAME,SUBJECT;TERM,SCORE)
Results of operation:
2. Dynamic column transposition
The above examples of are static transpositions, requiring that the forms and data processed are "well-behaved". But in the actual business, there are always unruly outliers, and I believe that there are a lot of them, so it is a bit inadequate to use SQL, whether it is pivot/unpivot, case when, or anything else. What are we going to do? At this point, the advantages of the aggregator are reflected:
has, for example, the following personal income statement (PersonalIncome), which records income.
but we want to get a table with a structure similar to the following
MANE INCOME_SOURCE_1 INCOMR_AMOUNT_1 INCOME_SOURCE_2 INCOMR_AMOUNT_2...
Zhangsan Wages 8000 Stock 6000.
We are not sure about the number of columns, or even the column names, after the rows are transferred. Instead of using the pivot function that is only suitable for static transpose, you need to use the method of dynamic transposition. The SPL language of the aggregator is much more flexible than the SQL language in terms of dynamic programming:
AB1=connect ("orcl")
2=A1.query ("select * from PERSONALINCOME") .group (NAME) 3=A2.max (~ .len ()) 4=create (NAME, ${A3. ("INCOME_SOURCE_" + string (~) + ", INCOME_SOURCE_" + string (~)) .concat@c () 5for A2=A5. NAME | A5.conj ([INCOME_SOURCE, INCOME_AMOUNT]) 6
A4.record (B5)
The results are as follows:
Transpose simultaneous column-to-column calculation
, suppose I have a weekly price list of vegetables.
and I want to calculate the weekly price trend of all kinds of vegetables, which includes four states: up, down, steady and initial (Monday's value).
The table structure designed by is roughly as follows
VEGETABLES Monday Tuesday Wednesday Thursday...
Eggplant Initial Rise Decline Rise...
Cucumber Initial Rise Rise...
……
Although the transpose that needs to use is of a static type, it needs to implement the calculation between columns when transposing, which is very troublesome for SQL. However, it would be much easier to use the SPL language of the more flexible aggregator:
AB1=connect ("orcl") = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"] 2=A1.query ("select * from VEGETABLEPRICES") 3=create (Vegetables, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) 4for A2.group (VEGETABLES) = A4.align (B1, WEEK) 5
= B4. (if (# = = 1: "Initial", PRICE_RMB > PRICE_RMB [- 1]: "Rise", PRICE_RMB
< PRICE_RMB [-1]:"Decline",PRICE_RMB = PRICE_RMB [-1]:"Stable"))6 >A3.record (A4.VEGETABLES | B5)
The weekly price trend of vegetables obtained by is as follows
Summary
Compared with pivot and unpivot provided by SQL, the transpose function provided by is more flexible and adaptable, and can meet a variety of complex transposition requirements.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.