Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How the aggregator assists java in dealing with the set operation of structured text

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how the aggregator helps java deal with the set operation of structured text. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

JAVA does not directly support set operations, so it is necessary to use nested loops to achieve the intersection, union, difference and other set operations between text files. If the number of files is large, or the files are too large to be directly calculated in memory, or according to multiple fields for collection operations, the corresponding code will be more complex. The aggregator directly supports collective operations, which can help JAVA to easily implement such algorithms. Let's take a look at the specific practice through examples.

There are two small files: f1.txt and f2.txt Name fields * the lines are column names, and now you need to perform an intersection operation on the file fields. Some of the data are as follows:

File f1.txt:

File f2.txt:

Aggregator code:

A1, B1: use the import function to read the file = [A1. (Name), B1. (Name)] .isect () into memory. The default delimiter is tab. The function option @ t means that the * row is read as a column name, so that subsequent calculations can directly refer to the corresponding column with Name and Dept. If the * row is not a column name, it should be referenced by default column names such as _ 1 and _ 2.

After calculation, the values of A1 and B1 are as follows:

The function import can read the specified column. For example, in this case, only Name will participate in the calculation, so you can read only the Name column. The corresponding code is: file ("E:\\ f1.txt") .import @ t (Name).

The A2 = function isect can perform the intersection operation between sets, A1. (Name) means to take out the Name column of A1 to form a set, and B1. (Name) means to take out the Name column of B1. The final result of this case is as follows:

A3:result A2 . This means that the calculation results are output to the JDBC interface. A3 can be combined with A2 into one step: result [A1. (Name), B1. (Name)] .isect ().

The above is the process of finding the intersection, and the union only needs to change the function: [A1. (Name), B1. (Name)] .union (). The calculated results are as follows:

The code for finding the difference set: [A1. (Name), B1. (Name)] .diff (). The result is as follows:

There is also a special set algorithm: the sum set, that is, the code of the sum set: [A1. (Name), B1. (Name)] .conj (). The result is as follows:

You can directly use operators instead of functions, which can be written more succinctly, such as intersection, union, difference, and set can be rewritten as:

A1. (Name) ^ B1. (Name)

A1. (Name) & B1. (Name)

A1. (Name)\ B1. (Name)

A1. (Name) | B1. (Name)

You can also perform set operations on multiple files, for example, f1.txt, f2.txt, and f3.txt are read into memory with the corresponding variables A1, B1, C1, respectively, and find the intersection of them, as follows: A1. (Name) ^ B1. (Name) ^ C1. (Name) or [A1. (Name), B1. (Name), C1. (Name)] .isect ().

Sometimes the file is relatively large, which will affect the performance of the set operation, so you can use the sort function to sort in advance, and then use the merge function to carry out the set operation, so the performance will be significantly improved. When finding the intersection, you should use the function option @ I, the union set uses @ u, and the difference set uses @ d. The corresponding codes are as follows:

= [A1. (Name) .sort (), B1. (Name) .sort ()] .merge @ I ()

= [A1. (Name) .sort (), B1. (Name) .sort ()] .merge @ u ()

= [A1. (Name) .sort (), B1. (Name) .sort ()] .merge @ d ()

The function merge can also perform multi-field set operation. Assuming that different Dept will have the same Name, now you need to perform the intersection operation with Dept and Name as a whole. The corresponding code is as follows: [A1.sort (Dept,Name), B1.sort (Dept,Name)] .merge @ I (Dept,Name).

The calculated results are as follows:

For large files that can not be stored in it, the cursor function can be used to read the file, and the merge function can be used to realize the set operation. The code for finding the intersection is as follows:

A1=file ("e:\\ f1.txt") .cursor ()

B1=file ("e:\\ f2.txt") .cursor ()

A2 = [A1.sortx (Name), B1.sortx (Name)] .merge @ xi (Name)

Note that the function cursor here does not read all the data into memory, but opens the file as a cursor (or stream). The aggregator engine automatically allocates the appropriate buffer, reads a part of the data each time to participate in the calculation, and cycles back and forth to complete the final calculation.

Unlike memory computation, cursor manipulation requires the use of cursor functions, such as the function sortx when sorting. The merge function here uses two function options, @ I for intersection and @ x for cursors instead of memory data. In addition, functions such as union can only perform set operations of in-memory data and cannot be used for large files.

The above script has done all the data processing, and then integrate the aggregator script into JAVA through JDBC. The JAVA code is as follows:

/ / establish esProc jdbc connection Class.forName ("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection ("jdbc:esproc:local://"); / / call esProc, where test is the script file name st = (com.esproc.jdbc.InternalCStatement) con.prepareCall ("call test ()"); st.execute (); / / execute the esProc stored procedure ResultSet set = st.getResultSet () / / get the calculation results after reading the above, do you have any further understanding of how the aggregator helps java deal with the set operation of structured text? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report