In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
With the rapid development of the Internet and the arrival of the era of big data, the processing of data resources has become a hot issue faced by various industries. In addition to dealing with huge amounts of special x4 data through a particularly professional, particularly expensive and complex analysis platform, we often encounter Excel (xls,xlsx) files that can be immediately available at hand. Excel is used to record, process, and exchange data because of the simplicity and convenience of the software, which can be clumsy if you need a bunch of external calls for subsequent analysis and processing. Now let's share how to analyze and process data sources in the form of Excel through lightweight use of aggregators.
Recently, the trade war between China and the United States has become more and more intense. Let's also rub the heat, use some publicly obtained economic indicators of China and the United States to make a simple comparison, simulate the actual combat scene, find the palm of the hand to read the article, know it like the back of your hand, and feel that everything is under control.
The following is the basic data file Indicators.xlsx we used, where each worksheet is a country's indicators:
Next, we open the aggregator and write the first line of the script:
= file ("Indicators.xlsx") = A1.xlsopen ()
Then we save the script file to the same directory as the Indicators.xlsx data file to make it easy to reference the relative path, which is called indicators.dfx.
To be clear, the grid that begins with "=" is called "calculation grid", in which the evaluation result of the expression will be automatically assigned to the cell, and the following script can use the cell name (A1, C25, etc.) directly. At the same time, two functions are used. The first is the file () function, which returns a file object. The parameter "Indicators.xlsx" is the file name, and because the relative path is written here, it is looked up in the sibling directory of the indicators.dfx file. The second is the xlsopen () function, which opens the file object A1 as an Excel workbook object.
The advantage of the aggregator is that it can be tested at any time. Let's take a look at the current status. After clicking the "execute" icon, the result is as follows:
In the right panel, you can see that the Excel file is displayed as a table, excluding the leftmost ordinal column, the first column is the worksheet name, and the second and third columns are the rows and rows of the worksheet, respectively. These three fields make up a data structure. Several records with this data structure form an ordered set, which we call an "ordered table". If the ordered set is not a record with the same data structure, but only some general data members, then we call it a "sequence", so the ordered table is a special sequence. The aggregator provides a large number of functions for sequences and ordinal tables and can support a variety of complex operations, so these two kinds of data objects are often encountered in the aggregator.
Next, we are going to read the contents of the worksheet. First look at the script:
= B1.xlsimport@t ()
The xlsimport () function imports a worksheet from workbook object B1. The @ t option is used here, which uses the contents of the first line as a title rather than data (which is common in Excel tables). Options need to be written after the function name, starting with @, and multiple options only need to write an @, such as @ tx. We don't use parameters here, so the function imports the entire contents of the first worksheet by default. Click "execute" to see the result, as follows:
You can see that the contents of the first worksheet are imported into a sequence table. But here we will encounter the first problem, each economic indicator Indicator includes two lines, the second line is the indicator update time, is what we do not care about. The order table of the aggregator is very simple to solve this kind of problem, just add a selection function:
= B1.xlsimport@t () .select (indicator assigned null)
Select () is the selection function, and the parameter indicator contains null is the selection condition, that is, whether to select the content of this line based on whether the Indicator column is empty or not, or whether to execute it to see the effect:
The problem of filtering invalid (uninteresting) data has been solved, and further research has found that the Last in each metric is of concern to us, so we want to import only the Indicator and Last fields. The script is adjusted as follows:
= B1.xlsimport@t (Indicator,Last) .select (indicator contains null)
The selected field parameter of the xlsimport () function is used here, and "Indicator,Last" is the field name you want to select (separated by commas). The next step is to import data from China:
= B1.xlsimport@t (Indicator,Last; "China") .select (indicator contains null)
Here the parameter adds a "China" that specifies the name of the worksheet to import, separated from the previous field parameter by ";". (you can also specify the start and end lines of the import, because it will not be introduced if it is not used in the example. If you are interested, you can refer to the tutorial documentation to try it out. After implementation, you can see China's indicators:
Now let's make a simple comparison and join the two tables according to Indicator:
= A2.join (Indicator,B2:Indicator,Last:'China')
The join () function is used here. Use the field Indicator of cell A2 to match the key Indicator of cell B2, find the corresponding record, spell the Last field of B2, and use the field name 'China'' to put it into the A2 order table, as shown below:
Then change the name of the American Last to US:
= A2.join (Indicator,B2:Indicator,Last:'China') .rename (Last:'United States')
The rename () function here is used to modify the field name of the order table. Each field is modified with a ":" mapping. Last is the old name, and 'United States' is the new field name:
As mentioned earlier, the aggregator provides a number of functions for calculating sequences and ordinal tables, which is very convenient to use. Here, we will use the sort function to sort the Sino-US indicators according to the size of the gap:
= A3.sort@z (abs ('United States'-'China'))
The sort () function is used to sort, and the @ z option is used to sort the results in descending order. Parameters are expressions that are used as the basis for sorting. Here, a simple difference is made with the US-China index, and then the absolute value is calculated with the abs () function. The results are as follows:
At this point, the required data has been sorted out. In order to reuse later, we add this comparison to a new worksheet'US vs China' in the workbook and write it back to the source file:
> B1.xlsexport@t (A4; "US vs China") > A1.xlswrite (B1)
It was mentioned earlier that the "=" begins with the calculation case, and here we encounter a kind of lattice that begins with ">", which we call the "execution case". Cells are not automatically assigned after execution.
The xlsexport () function used here is used to write order table A4 to workbook object B1, and the second parameter separated by ":" is the worksheet name. The option is also used here, where @ t indicates that you need to export the title row to the worksheet. The xlswrite () function writes the workbook object B1 to the file object A1.
Finally, let's open the excel file to see the final result:
With a few simple lines of script, the import, connection, sorting, export and other functions of multiple worksheets in Excel files can be easily realized. This is only the corner of the aggregator, and there are more features waiting for us to explore and apply.
The complete script references are as follows:
AB1=file ("Indicators.xlsx") = A1.xlsopen () 2=B1.xlsimport@t (Indicator,Last). Select (indicators are null) = B1.xlsimport@t (Indicator,Last; "China"). Select (indicators are null) 3=A2.join (Indicator,B2:Indicator,Last:'China') .rename (Last:'United States')
4=A3.sort@z (abs ('United States'-'China'))
5 > B1.xlsexport@t (A4; "US vs China") > A1.xlswrite (B1)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 280
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.