It is easy to merge Excel files by embedding Java into SPL 07/04 Update SLTechnology News&Howtos

It is easy to merge Excel files by embedding Java into SPL

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Most JAVA programmers choose to use third-party class libraries such as POI or HSSFWorkbook to automate Excel merging, which not only requires a lot of code and effort, but also is not flexible to use, and the format of Excel is very strict. It would be nice to have a dedicated external data tool, write a simple SQL-like script to implement it, and then call it directly in JAVA and return the result set. Dry aggregator is such a mechanism, by using JDBC to call the SPL script, easy to use, here to learn how to use.

SPL implementation

Regular merge:

a. Merge multiple Sheet tables in the same excel

The following example is an excel file containing sales data, which contains three sheet worksheets with the same structure by month, with the following data:

January_2013:

February_2013:

March_2013:

While merging three Sheet, we can also filter out the fields Customer Name and Sale Amount from each sheet at the same time. The final effect is as follows:

SPL script:

one

= file ("D:/sales_2013.xlsx") .xlsopen ()

two

= A1.conj (A1.xlsimport@t ('Customer Name','Sale Amount';~.stname))

three

File ("D:/result_2013.xlsx"). Xlsexport@t (A2; "merge_sheets")

Save the script file oneExcel.dfx (used for embedding Java)

Script description:

A1: opens the specified excel file and creates a sequence of multiple sheet worksheets.

A2: use the conj function to traverse all the member worksheets in the A1 sequence, import the specified column 'Customer Name','Sale Amount', in each worksheet, and merge the data. The xlsimport function imports the specified column, and the last column is separated by a semicolon. The parameter ~ .stname specifies the current worksheet, and since you are in the loop of the conj function, you can import all worksheets one by one. At the same time, xlsimport uses the option @ t to indicate that the first row of the worksheet record is used as the field name.

A3: save the sequence table A2 as a new worksheet "merge_sheets" to the original excel file, and also use the option @ t to specify the first line as the title.

This script has only three sentences, which is short, concise, logical and easy to understand. Let's take a look at how to merge multiple worksheets in multiple files.

b. Multi-table merging in different excel

Here are several excel files to be merged, all with the same table structure as the example above, and each file records the data for the current year:

SPL script:

one

For directory@p ("d:/excel/*.xlsx")

= file (A1) .xlsopen ()

two

= B1.conj (B1.xlsimport@t ('Customer Name','Sale Amount','Purchase Date';~.stname))

three

= @ | B2

four

File ("d:/result.xlsx"). Xlsexport@t (B3; "merge_data")

The results of the merger are as follows:

Save the script file MergeExcels.dfx (used for embedding Java)

Script description:

A1: traverses the excel files in the specified directory through the for loop and processes them within the loop between B1 and B3.

B1: open an excel file in the directory and generate a sequence.

B2: import the data for the column 'Customer Name','Sale Amount','Purchase Date' specified in each sheet worksheet in the current file, and then merge the data, similar to A2 in the previous example.

B3: merge the data of sequence table B2 with the values of this grid represented by @.

A4: save order sheet B3 to the merge_data worksheet in the result.xlsx file.

The above program uses two loops to realize the data merge of multiple excel files, the outer loop for traverses all the excel files under the directory, and the inner loop B1.conj merges the data of multiple sheet worksheets in each excel file.

c. Merge into large files

A2 in the first example and B3 in the second example load all the data of the merged Excel in memory and then write it out at once. If the files are too many and too large, the memory footprint will be large, even beyond the range of memory allowed. To do this, we can use streaming append to generate large files in the SPL script.

SPL script:

one

= file ("D:/out.xlsx")

two

For directory@p ("d:/excel/*.xlsx")

= file (A2) .xlsopen ()

three

= if (A1.exists (), B2.xlsimport@t (), B2.xlsimport ())

four

> A1.xlsexport@s (B3; "merger")

The effect of the merger is as follows:

Save the script file BigExcel.dfx (used for embedding Java)

Script description:

A1: opens the specified output file.

A2: traverse the excel files that need to be merged in the directory.

B2: open an excel file that needs to be merged.

B3: if the output file does not exist, read all the data from the sheet worksheet, including the title row; if the output file already exists, indicate that the first line is the title with the @ t option, and read the data from the second line.

B4: appends the data read by B3 to the merger worksheet of the output file specified by A1.

Through streaming to read file data one by one and then append write, this method is suitable for merging a large number of small excel files into one large excel file.

JAVA call

It is very convenient to embed SPL into the Java application. Load it by calling the stored procedure method in JDBC, and merge the saved file OneExcel.dfx with multiple Sheet tables in the same excel. The example calls are as follows:

.. Connection con= null;Class.forName ("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection ("jdbc:esproc:local://"); / / call the stored procedure, where OneExcel is the file name of dfx st = (com. Esproc.jdbc.InternalCStatement) con.prepareCall ("call OneExcel ()"); / execute the stored procedure st.execute (); / / get the result set ResultSet rs = st.getResultSet (); .Connection con= null;Class.forName ("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection ("jdbc:esproc:local://"); / / call the stored procedure, where OneExcel is the file name of dfx st = (com. Esproc.jdbc.InternalCStatement) con.prepareCall ("call OneExcel ()"); / execute the stored procedure st.execute (); / / get the result set ResultSet rs = st.getResultSet ();

It's the same thing to replace it with MergeExcels.dfx/BigExcel.dfx, just call MergeExcels () or call BigExcel (). Only Java snippets are used here to roughly explain how to embed SPL. For detailed steps, see how Java invokes the SPL script, which is also very simple and won't go into detail. At the same time, SPL also supports ODBC drivers, integrated into languages that support ODBC, and the embedding process is similar.

Expansion of excerpts

With regard to the processing of Excel files, in addition to the normal merge described above, you can also group and de-duplicate the summarized results in the SPL script, and then export the result set.

Group summary

Let's continue to take the previous sales data excel file as an example.

a. Field grouping

To implement grouping calculation based on a field or multiple fields, the SPL script is as follows:

one

= file ("D:/sales_2013.xlsx") .xlsopen ()

two

= A1.conj (A1.xlsimport@t (; ~ .stname))

three

= A2.groups ('Customer ID';sum (' Sale Amount'): Total,avg ('Sale Amount'): Average)

four

= A2.groups ('Customer ID','Purchase Date';sum (' Sale Amount'): Total)

The effect of A3:

The effect of A4:

Script description:

A1: opens the specified excel file.

A2: reads and merges data from all sheet worksheets in the file.

A3: on the merged data, the sales and average are grouped according to the field 'Customer ID'.

A4: calculate sales in groups according to the field 'Customer ID',' Purchase Date' on the merged data.

b. Group in order

The aggregator can also compare with adjacent data rows when grouping aggregation, and can no longer sort when the original data is ordered, so as to save time and maintain the original order. Assuming that the original data has been sorted by date, when we want to group statistics by month, the code is as follows.

SPL script:

one

For directory@p ("d:/excel/*.xlsx")

= file (A1) .xlsopen ()

two

= B1.conj (B1.xlsimport@t (; ~ .stname))

three

= @ | B2

four

= B3.derive (year ('Purchase Date'): Year,month (' Purchase Date'): Month)

five

= A4.groups (month ('Purchase Date'): Month;sum (' Sale Amount'): Total,avg ('Sale Amount'): Average)

six

= A4.groups@o (month ('Purchase Date'): Month;sum (' Sale Amount'): Total,avg ('Sale Amount'): Average)

A5 grouping effect:

A6 grouping effect:

Script description:

A1 to B3: as described in the previous example, the worksheets of all excel files of the same structure in the same directory are merged.

A4: a sequence table A4 is basically reconstructed in sequence table B3 to split the date and add year and month fields.

A5:groups summarizes sales and average in monthly groups across the year.

A6:groups@o summarizes sales and average by year and month, and achieves grouping and merging processing with the parameter @ o.

Among them, A4 is the details of data records; A5 is counted monthly and does not distinguish between years; and A6 is counted by year and month. The data in these three cells show different levels of merged summary results.

c. Segmented grouping

The data to be counted are divided into several segments according to conditions, and the situation of each group is counted.

SPL script:

one

For directory@p ("d:/excel/*.xlsx")

= file (A1) .xlsopen ()

two

= B1.conj (B1.xlsimport@t (; ~ .stname))

three

= @ | B2

four

= B3.groups (the data in if ('Sale Amount') is unique, but the merged data is not necessarily unique, so the duplicate data is removed by primary key.

b. De-duplication of a field

Remove the reprocessing according to a field in the data table sales_2013 and view the employee records with different names.

one

= file ("d:/sales_2013.xlsx") .xlsopen ()

two

= A1.conj (A1.xlsimport@t ('Customer ID',' Customer Name';~.) Stname))

three

= A2.id ('Customer Name')

four

= A2.group@1 ('Customer Name')

five

File ("d:/out.xlsx"). Xlsexport@t (A4; "result")

Code description:

A1: opens the specified excel file.

A2: imports data for the specified column in the sheet worksheet.

A3: get records that do not repeat names from sequence table A2.

A4: get a list of records that do not repeat names from sequence table A2.

A5: save the sequence table A4 and record the first line as the title.

A3 data deduplication result:

A4 data deduplication result:

c. Joint multi-field deduplication

Although some records have primary keys, it is necessary to use several other fields to determine whether they are duplicate records. At this time, multiple fields are used to determine whether there are duplicate records.

one

= file ("d:/sales_2013.xlsx") .importxls@t ()

two

= file ("d:/sales_2014.xlsx") .importxls@t ()

three

= [A1 Magazine A2] .merge ('Customer ID',' Purchase Date')

four

= A3.group@1 ('Customer ID',' Purchase Date')

five

File ("d:/out.xlsx"). Xlsexport@t (A4; "result")

Code description:

A1: imports data from the specified excel file.

A2: ditto.

A3: merge the order table A1 ~ A2 according to the field 'Customer ID',' Purchase Date', and return to the order table A3.

A4: sequence table A3 is grouped according to 'Customer ID',' Purchase Date' to remove weight.

A5: save the results.

Of course, you can also group and merge with reference to more fields as needed to remove duplicate records.

d. Record-level deduplication

Resolve that the records in each file you want to merge are not duplicated, but there may be duplicate records after the merge.

one

= file ("d:/sales_2013.xlsx") .importxls@t ()

= A1.group@1 ('Invoice Number')

two

= file ("d:/sales_2014.xlsx") .importxls@t ()

= A2.group@1 ('Invoice Number')

three

= [B1MagneB2] .merge @ u ()

= A3.count ()

Code description:

A1: import data from the excel file.

B1: remove the duplicate data from the sequence table A1 according to the field 'Invoice Number'.

A2, B2: ditto

A3: merge the data of the sequence table B1, and return to table A3 after removing the duplicate data records. The option @ u means that the order table members are merged in order to form a new order table, removing duplicate records.

B3: view the number of merged data records.

Merge@u is suitable for merging multiple order tables, where the order table is ordered inside and there is no duplicate data.

Summary

When using SPL to deal with isomorphic / heterogeneous excel multi-file merging, grouping summary data and data deduplication, you only need to read the fields that need to be merged into the collection objects of the aggregator. After learning to use this professional data processing tool, you can merge not only Excel files, but also other text data in the same way, and you no longer have to worry about multiple files, large files and structural differences in the merged data.

SPL advantage

There is a library to write SQL, no library to write SPL

Using the Java program to directly summarize and calculate the data is still tiring, the code is very long and can not be reused, and in many cases the data is not in the database. With SPL, it is as convenient as using SQL in Java.

Commonly used worry-free, you can get the entry version of the right to use for life without spending money.

If the data to be analyzed is one-off or temporary, the dry aggregator provides a free trial license every month and can be recycled for free use. However, if you want to integrate with the Java application and deploy it to the server for long-term use, it is still troublesome to change the trial license on a regular basis. Moisturizer provides an entry version with lifetime access, which solves this worry. How to use the dry aggregator for free?

Technical documentation and community support

There are many ready-made examples in the official aggregator technical documentation itself, and solutions to common problems can be found in the documents. If you get the entry version, you can not only use the general functions of SPL, but also go to the dry college for consultation if you encounter any problems. The official community provides free technical support to entry version users.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.