Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Compare data differences in csv files

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Csv files are basically similar in structure to a database table when storing data, but they are used frequently because they are easy to read and write, such as outputting some temporary results or continuously recording data in the form of logs. However, when the data needs to be processed further, if you have to import the database first, then the convenience of csv itself will be lost.

In fact, through the aggregator, you can process csv files directly, and you can also perform some "advanced" operations, such as comparing the differences between two csv files as described in this article.

Suppose that in a simple sales system, the front-end system is only responsible for typing, including adding, modifying, and deleting orders, while backing up and archiving the data files once a day. In the post-analysis, you need to view new, cancelled, and modified orders over a certain period of time. The following is the operation of directly comparing csv files without using the database.

Two files from March 2015 are used in the example, the earlier one is old.csv and the later one is new.csv. The logical primary keys in the file are userName and date, and you need to find new, deleted, and modified rows of data, respectively. The source file is as follows:

Old.csvNew.csv

one

two

three

four

five

six

seven

eight

nine

UserName,date,saleValue,saleCount

Rachel,2015-03-01, 4500, 9

Rachel,2015-03-03, 8700 and 4

Tom,2015-03-02 3000pm 8

Tom,2015-03-03. 5000. 7.

Tom,2015-03-04. 6000. 12.

John,2015-03-02 4000Jol 3

John,2015-03-02 4300 and 9

John,2015-03-04 486 4800 4

UserName,date,saleValue,saleCount

Rachel,2015-03-01, 4500, 9

Rachel,2015-03-02 5000pm 5

Ashley,2015-03-01, 6000, 5.

Rachel,2015-03-03. 11700.

Tom,2015-03-03. 5000. 7.

Tom,2015-03-04. 6000. 12.

John,2015-03-02 4000Jol 3

John,2015-03-02 4300 and 9

John,2015-03-04 486 4800 4

Looking directly at the data, you can see that lines 2 and 3 in new.csv are new records, line 4 is modified records, and line 3 in old.csv is deleted records.

The aggregator code is as follows:

AB1=file ("d:\\ old.csv") .import@t (;, ",") = file ("d:\\ new.csv") .import@t (;, ",") 2=A1.sort (userName,date) = B1.sort (userName,date) 3 minutes new.merge @ d (userName,date)

4 deleted delete = [A2MagneB2] .merge @ d (userName,date)

5merge diff= [B2Magazine A2] .merge @ d (userName,date,saleValue,saleCount) 6updated date = [A5PowerNew] .merge @ d (userName,date) return update

A1 minute B1: read the file with a comma delimiter.

A2PowerB2: sort the data by keywords. Because the use of the merge function later requires that the data be ordered.

A3: the function merge can merge multiple datasets and use the option @ d to find out the difference when merging. Similarly, there are union options @ u and intersection options @ I. The new record is actually the difference between the newer data and the older data by keyword. The calculated results are as follows:

A4: similarly, the difference between older data and newer data by keyword is the deleted record. The calculation results are as follows:

A5: use keywords as normal fields to calculate the difference and find all the modified records. The calculated results are as follows:

A6: to calculate the updated record, take A5 as the intermediate result and calculate the difference between the A5 and the "new" record. The calculated results are as follows:

B6: return A6 to JAVA or the reporting tool.

The above script does all the data processing, and then you can integrate the aggregator script into JAVA through JDBC. The JAVA code is as follows:

/ / establish an esProc jdbc connection

Class.forName ("com.esproc.jdbc.InternalDriver")

Con= DriverManager.getConnection ("jdbc:esproc:local://")

/ / call esProc, where test is the script file name and can receive parameters

St = (com.esproc.jdbc.InternalCStatement) con.prepareCall ("call test ()")

Com.esproc.jdbc.InternalCStatement st = (com.esproc.jdbc.InternalCStatement) con.prepareCall ("call test ()")

St.execute (); / / execute the esProc stored procedure

ResultSet set = st.getResultSet (); / / get the calculation result

If you want to return multiple datasets to JAVA, you can change the code of B6 to: return new,delete,update.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report