Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement document Operation based on LibreOffice

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to achieve document operation based on LibreOffice, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Document conversion project based on libreoffice, no framework dependency, plug and play

1. Technology stack

LibreOffice:v6.2.3

Jodconverter:4.2.2

PDFBox:2.0.12

Cglib dynamic agent + lazy factory mode + policy mode + decorator mode

Qtools-property management configuration file (any one of the three named configuration files application.yml, bootstrap.yml and workable-converter.yml can be included)

two。 Function

Supports the conversion of files of doc, docx, html, ppt, png, pdf, etc.

Support conversion according to file path, byte input / output stream, Base64 and other different postures

Do not rely on the third-party framework, plug and play, and support three configurations of application.yml, bootstrap.yml and workable-converter.yml (you can configure one in your own project)

3. Use 3.1to install and configure LibreOffice6.2.3

CentOS, please refer directly to this article: CentOS7 install LibreOffice6.2.3

Windows and Mac can also get the download link in the above article.

After the installation is complete, please remember the Home directory of your LibreOffice, which you need to use later

Default directory:

CentOS: / opt/libreoffice6.2/

Mac: / Applications/LibreOffice.app/Contents/

Windows: C:\ Program Files\ LibreOffice\

3.2 acquire dependencies

Maven

Com.liumapp.workable.converter workable-converter v1.2.0

Gradle

Compile group: 'com.liumapp.workable.converter', name:' workable-converter', version: 'v1.2.0i3.3 Edit configuration file

To create a yml configuration file under the resources directory of the project, you need to make sure that the file name is application.yml, bootstrap.yml, or workable-converter.yml.

Add the following configuration:

Com: liumapp: workable-converter: libreofficePath: "/ Applications/LibreOffice.app/Contents"

The value of libreofficePath is the installation directory of LibreOffice:6.2.3

The complete list of configuration items is as follows

Parameter name interpretation default value libreofficePathLibreOffice installation directory (String) no default value, required libreofficePortLibreOffice listening port (int) 2002 tmpPath temporary storage directory (String) ". / data/"

3.4 perform conversion 3.4.1 convert by file path

Take doc to PDF as an example

WorkableConverter converter = new WorkableConverter (); / / initialize the configuration item while instantiating. The configuration item is verified by Decorator decoration ConvertPattern pattern = ConvertPatternManager.getInstance (); pattern.fileToFile (". / data/test.doc", ". / data/pdf/result1.pdf"); / / test.doc is the file path to be converted, and result1.pdf is the conversion result storage path pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF); converter.setConverterType (CommonConverterManager.getInstance ()) / / Policy mode. After the new conversion strategy is implemented, change it here, and the image conversion will consider using the new policy to complete boolean result = converter.convert (pattern.getParameter ()).

If you want to use html to PDF, change the

Pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF)

Change to

Pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.HTML); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF)

The same with other types.

3.4.2 convert according to input and output stream

Take doc to pdf as an example

/ / you can also choice not use proxyWorkableConverter converter = new WorkableConverter (); ConvertPattern pattern = ConvertPatternManager.getInstance (); pattern.streamToStream (new FileInputStream (". / data/test.doc"), new FileOutputStream (". / data/pdf/result1_2.pdf")); / / attention! Convert by stream must set prefix.pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF); converter.setConverterType (CommonConverterManager.getInstance ()); boolean result = converter.convert (pattern.getParameter ())

Basically the same as the previous example, the only change is to set the input and output stream through pattern.streamToStream (). The conversion source file data is read from the input stream, and the conversion result is written directly to the output stream.

At the same time, to switch the conversion format, you can set a different prefix as in the above example.

3.4.3 convert by file Base64

Still take doc to pdf as an example

WorkableConverter converter = new WorkableConverter (); ConvertPattern pattern = ConvertPatternManager.getInstance (); pattern.base64ToBase64 (Base64FileTool.FileToBase64 (new File (". / data/test.doc"); / / attention!! Convert by base64 must set prefix.pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF); converter.setConverterType (CommonConverterManager.getInstance ()); boolean result = converter.convert (pattern.getParameter (); String destBase64 = pattern.getBase64Result ())

Enter base64 to perform the conversion. First, set the Base64 value of the conversion source through pattern.base64ToBase64 ().

The conversion result result is still a boolean type, and the Base64 value of the conversion result is obtained by pattern.getBase64Result.

To switch the conversion format, set a different prefix as in the example above.

3.5 Image processing

At present, for image processing, you can only convert PDF to PNG images (if a pdf file has 20 pages, it will be converted to 20 png images). The implementation of this function is based on PDFBox:2.0.12.

3.5.1 processing according to file path

The first parameter of pattern.fileToFiles () is the path of pdf file to be converted, and the second parameter is the path of image storage after conversion.

WorkableConverter converter = new WorkableConverter (); ConvertPattern pattern = ConvertPatternManager.getInstance (); pattern.fileToFiles (". / data/test5.pdf", ". / data/"); pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PNG); converter.setConverterType (PdfBoxConverterManager.getInstance ()); / / pdf box converter manager only support pdf to pngassertEquals (true, converter.convert (pattern.getParameter (); assertEquals (true, FileTool.isFileExists (". / data/test5_0.png")) AssertEquals (true, FileTool.isFileExists (". / data/test5_1.png")); assertEquals (true, FileTool.isFileExists (". / data/test5_2.png")); assertEquals (true, FileTool.isFileExists (". / data/test5_3.png")); 3.5.2 according to the file Base64

The parameter of pattern.base64ToBase64 () is the Base64 value of the pdf file to be converted.

After the conversion, get the collection of Base64 values of the converted image through List resultBase64 = pattern.getBase64Results ()

WorkableConverter converter = new WorkableConverter (); ConvertPattern pattern = ConvertPatternManager.getInstance (); pattern.base64ToBase64 (Base64FileTool.FileToBase64 (new File (". / data/test5.pdf")); pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PNG); converter.setConverterType (PdfBoxConverterManager.getInstance ()); / / pdf box converter manager only support pdf to pngboolean result = converter.convert (pattern.getParameter ()); List resultBase64 = pattern.getBase64Results (); assertEquals (true, result); assertEquals (4, resultBase64.size ()); 3.6 add watermark

The conversion strategy of watermark is WaterMarkConverter.

Considerations for adding watermark

Make sure that the input source file suffix is PDF and the output source file suffix is PDF

Watermark parameters need to be set by new and a WaterMarkRequire.

SetWaterMarkPage (int page) represents the page on which the watermark is added. If 0, all pages are represented.

The watermark itself is a PDF file, which only needs one page, and the content of the first page will be added to the source file as a watermark.

For example, if you want to add text with a transparency of 0.3 as a watermark, use tools such as word to draw a font with a transparency of 0.3 (or a png image that contains transparency) and save it as a watermark.pdf file.

Then use waterMarkRequire.setWaterMarkPDFBase64 (Base64FileTool.FileToBase64 (new File (". / data/watermark.pdf")

Or waterMarkRequire.setWaterMarkPDFBytes (FileUtils.readFileToByteArray (new File (". / data/watermark.pdf")) can enter the base64 or bytes value of the file.

Specific use can be divided into three ways

3.6.1 add watermark according to the file path WorkableConverter converter = new WorkableConverter (); converter.setConverterType (WaterMarkConverterManager.getInstance ()); / / Select a specific watermark conversion strategy ConvertPattern pattern = ConvertPatternManager.getInstance (); WaterMarkRequire waterMarkRequire = new WaterMarkRequire (); / / parameters required to create the watermark / / specify which page to add the watermark. If 0, add the watermark waterMarkRequire.setWaterMarkPage (0) to all pages. / / 0 means all agewaterMarkRequire.setWaterMarkPDFBase64 (Base64FileTool.FileToBase64 (new File (". / data/watermark.pdf")); pattern.setWaterMarkRequire (waterMarkRequire); pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.fileToFile (". / data/test5.pdf", ". / data/test5_with_mark01.pdf"); / / the watermarked file is saved in the. / data/ directory named test5_with_mark01.pdfboolean result = converter.convert (pattern.getParameter ()) AssertEquals (true, result); 3.6.2 add watermarks by stream WorkableConverter converter = new WorkableConverter (); converter.setConverterType (WaterMarkConverterManager.getInstance ()); ConvertPattern pattern = ConvertPatternManager.getInstance (); WaterMarkRequire waterMarkRequire = new WaterMarkRequire (); waterMarkRequire.setWaterMarkPage (0); / / 0 means all agewaterMarkRequire.setWaterMarkPDFBytes (FileUtils.readFileToByteArray (new File (". / data/watermark.pdf")); pattern.setWaterMarkRequire (waterMarkRequire); pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF) Pattern.streamToStream (new FileInputStream (". / data/test5.pdf"), new FileOutputStream (". / data/test5_with_mark02.pdf"); boolean result = converter.convert (pattern.getParameter ()); assertEquals (true, result); 3.6.3 add watermarks WorkableConverter converter = new WorkableConverter () according to base64; converter.setConverterType (WaterMarkConverterManager.getInstance ()); ConvertPattern pattern = ConvertPatternManager.getInstance (); WaterMarkRequire waterMarkRequire = new WaterMarkRequire (); waterMarkRequire.setWaterMarkPage (0) / / 0 means all agewaterMarkRequire.setWaterMarkPDFBase64 (Base64FileTool.FileToBase64 (new File (". / data/watermark.pdf")); pattern.setWaterMarkRequire (waterMarkRequire); pattern.setSrcFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix (DefaultDocumentFormatRegistry.PDF); pattern.base64ToBase64 (Base64FileTool.FileToBase64 (new File (". / data/test5.pdf")); boolean result = converter.convert (pattern.getParameter ()); String base64Result = pattern.getBase64Result (); Base64FileTool.saveBase64File (base64Result, ". / data/test5_with_mark03.pdf"); assertEquals (true, result); 4. To-do list

Those who have passed the test include doc, docx, html and transfer to PDF according to different postures. Other types of test units have not been written, which will be considered later.

Currently, only yml configuration is supported. Other types of configuration support (xml, properties, etc.) will be considered later.

At present, Markdown format is very popular. Consider implementing string conversion PDF in markdown format (markdown-> html-> pdf).

5. Matters needing attention

Because LibreOffice support is required, it is not recommended to run in containers such as Docker (LibreOffice does not have an image of a stable Docker distribution)

Conversion of garbled code and conversion takes too long. Please check whether the server has Chinese fonts installed.

After the start of the project, when performing the first conversion task, it will take a long time because it involves operations such as establishing a connection with LibreOffice, and the second task will be stable within half a second (the specific time varies depending on the machine configuration)

The above is all the contents of the article "how to implement document manipulation based on LibreOffice". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report