What is the optimization process for Java to compress 20m files from 30 seconds to 1 second? 09/13 Update SLTechnology News&Howtos

What is the optimization process for Java to compress 20m files from 30 seconds to 1 second?

2025-09-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is the optimization process of Java compression 20m files from 30 seconds to 1 second", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "Java compression 20m files from 30 seconds to 1 second of the optimization process is what" bar!

There is a need to send 10 photos from the front end, and then the back end will process it and compress it into a compressed package and send it out through the network. Did not contact with Java compressed files before, so directly on the Internet to find an example to change the use, after the change can also be used, but with the front end of the picture size is getting larger and larger, the time spent is also increasing sharply, and finally tested the compression of 20m files actually take 30 seconds. The code for the compressed file is as follows.

Public static void zipFileNoBuffer () {File zipFile = new File (ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream (new FileOutputStream (zipFile) {/ / start time long beginTime = System.currentTimeMillis (); for (int I = 0; I

< 10; i++) { try (InputStream input = new FileInputStream(JPG_FILE)) { zipOut.putNextEntry(new ZipEntry(FILE_NAME + i)); int temp = 0; while ((temp = input.read()) != -1) { zipOut.write(temp); } } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } 这里找了一张2M大小的图片，并且循环十次进行测试。打印的结果如下，时间大概是30秒。 fileSize:20M consum time:29599 第一次优化过程-从30秒到2秒进行优化首先想到的是利用缓冲区 BufferInputStream。在 FileInputStream中 read()方法每次只读取一个字节。源码中也有说明。 /** * Reads a byte of data from this input stream. This method blocks * if no input is yet available. * * @return the next byte of data, or -1 if the end of the * file is reached. * @exception IOException if an I/O error occurs. */ public native int read() throws IOException; 这是一个调用本地方法与原生操作系统进行交互，从磁盘中读取数据。每读取一个字节的数据就调用一次本地方法与操作系统交互，是非常耗时的。例如我们现在有30000个字节的数据，如果使用 FileInputStream那么就需要调用30000次的本地方法来获取这些数据，而如果使用缓冲区的话(这里假设初始的缓冲区大小足够放下30000字节的数据)那么只需要调用一次就行。因为缓冲区在第一次调用 read()方法的时候会直接从磁盘中将数据直接读取到内存中。随后再一个字节一个字节的慢慢返回。 BufferedInputStream内部封装了一个byte数组用于存放数据，默认大小是8192 优化过后的代码如下 public static void zipFileBuffer() { File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile)); BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) { //开始时间 long beginTime = System.currentTimeMillis(); for (int i = 0; i < 10; i++) { try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) { zipOut.putNextEntry(new ZipEntry(FILE_NAME + i)); int temp = 0; while ((temp = bufferedInputStream.read()) != -1) { bufferedOutputStream.write(temp); } } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 输出 ------Buffer fileSize:20M consum time:1808 可以看到相比较于第一次使用 FileInputStream效率已经提升了许多了第二次优化过程-从2秒到1秒使用缓冲区 buffer的话已经是满足了我的需求了，但是秉着学以致用的想法，就想着用NIO中知识进行优化一下。使用Channel 为什么要用 Channel呢?因为在NIO中新出了 Channel和 ByteBuffer。正是因为它们的结构更加符合操作系统执行I/O的方式，所以其速度相比较于传统IO而言速度有了显著的提高。Channel就像一个包含着煤矿的矿藏，而 ByteBuffer则是派送到矿藏的卡车。也就是说我们与数据的交互都是与 ByteBuffer的交互。在NIO中能够产生 FileChannel的有三个类。分别是 FileInputStream、 FileOutputStream、以及既能读又能写的 RandomAccessFile。源码如下 public static void zipFileChannel() { //开始时间 long beginTime = System.currentTimeMillis(); File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile)); WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) { for (int i = 0; i < 10; i++) { try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) { zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE)); fileChannel.transferTo(0, FILE_SIZE, writableByteChannel); } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 我们可以看到这里并没有使用 ByteBuffer进行数据传输，而是使用了 transferTo的方法。这个方法是将两个通道进行直连。 This method is potentially much more efficient than a simple loop * that reads from this channel and writes to the target channel. Many * operating systems can transfer bytes directly from the filesystem cache * to the target channel without actually copying them. 这是源码上的描述文字，大概意思就是使用 transferTo的效率比循环一个 Channel读取出来然后再循环写入另一个 Channel好。操作系统能够直接传输字节从文件系统缓存到目标的 Channel中，而不需要实际的 copy阶段。 copy阶段就是从内核空间转到用户空间的一个过程可以看到速度相比较使用缓冲区已经有了一些的提高。 ------Channel fileSize:20M consum time:1416 内核空间和用户空间那么为什么从内核空间转向用户空间这段过程会慢呢?首先我们需了解的是什么是内核空间和用户空间。在常用的操作系统中为了保护系统中的核心资源，于是将系统设计为四个区域，越往里权限越大，所以Ring0被称之为内核空间，用来访问一些关键性的资源。Ring3被称之为用户空间。用户态、内核态：线程处于内核空间称之为内核态，线程处于用户空间属于用户态那么我们如果此时应用程序(应用程序是都属于用户态的)需要访问核心资源怎么办呢?那就需要调用内核中所暴露出的接口用以调用，称之为系统调用。例如此时我们应用程序需要访问磁盘上的文件。此时应用程序就会调用系统调用的接口 open方法，然后内核去访问磁盘中的文件，将文件内容返回给应用程序。大致的流程如下

Direct buffer and indirect buffer

Since we have to read the files on a disk, we have to waste so many twists and turns. Is there any simple way to enable our application to manipulate disk files directly without the need for kernel transit? Yes, that is to set up a direct buffer zone.

Indirect buffer: the indirect buffer is the kernel state we mentioned above as the middleman, requiring the kernel to be in the middle every time.

Direct buffer: the direct buffer does not need the kernel space to transfer copy data, but directly applies for a piece of space in the physical memory, which is mapped to the kernel address space and the user address space. The data access between the application and the disk interacts through this directly requested physical memory.

Since direct buffers are so fast, why don't we all use direct buffers? In fact, the direct buffer has the following disadvantages. Disadvantages of direct buffers:

1. Unsafe

2. Consume more because it does not directly open up space in JVM. The collection of this part of memory can only rely on the garbage collection mechanism, when garbage collection is out of our control.

3. When the data is written into the physical memory buffer, the program loses the management of the data, that is, when the data is finally written from the disk can only be decided by the operating system, and the application can no longer interfere.

To sum up, so we use the transferTo method to directly open up a direct buffer. So the performance has been improved a lot.

Use memory-mapped files

Another new feature in NIO is memory-mapped files. Why are memory-mapped files fast? In fact, the reason is the same as mentioned above, but also to open up a direct buffer in memory. Interact directly with the data. The source code is as follows

/ / Version 4 uses the Map mapping file public static void zipFileMap () {/ / start time long beginTime = System.currentTimeMillis (); File zipFile = new File (ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream (new FileOutputStream (zipFile)); WritableByteChannel writableByteChannel = Channels.newChannel (zipOut)) {for (int I = 0; I)

< 10; i++) { zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE)); //内存中的映射文件 MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "r").getChannel() .map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE); writableByteChannel.write(mappedByteBuffer); } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 打印如下 ---------Map fileSize:20M consum time:1305 可以看到速度和使用Channel的速度差不多的。使用Pipe Java NIO 管道是2个线程之间的单向数据连接。Pipe有一个source通道和一个sink通道。其中source通道用于读取数据，sink通道用于写入数据。可以看到源码中的介绍，大概意思就是写入线程会阻塞至有读线程从通道中读取数据。如果没有数据可读，读线程也会阻塞至写线程写入数据。直至通道关闭。 Whether or not a thread writing bytes to a pipe will block until another thread reads those bytes

Here's what I want. The source code is as follows

/ / Version 5 uses Pip public static void zipFilePip () {long beginTime = System.currentTimeMillis (); try (WritableByteChannel out = Channels.newChannel (new FileOutputStream (ZIP_FILE) {Pipe pipe = Pipe.open (); / / Asynchronous task CompletableFuture.runAsync (()-> runTask (pipe)); / / get read channel ReadableByteChannel readableByteChannel = pipe.source () ByteBuffer buffer = ByteBuffer.allocate (int) FILE_SIZE) * 10); while (readableByteChannel.read (buffer) > = 0) {buffer.flip (); out.write (buffer); buffer.clear ();}} catch (Exception e) {e.printStackTrace ();} printInfo (beginTime) } / / Asynchronous task public static void runTask (Pipe pipe) {try (ZipOutputStream zos = new ZipOutputStream (Channels.newOutputStream (pipe.sink (); WritableByteChannel out = Channels.newChannel (zos)) {System.out.println ("Begin"); for (int I = 0; I < 10; iSuppli +) {zos.putNextEntry (new ZipEntry (i+SUFFIX_FILE)) FileChannel jpgChannel = new FileInputStream (new File (JPG_FILE_PATH)). GetChannel (); jpgChannel.transferTo (0, FILE_SIZE, out); jpgChannel.close ();} catch (Exception e) {e.printStackTrace () }} at this point, I believe you have a deeper understanding of "what is the optimization process of Java compressing 20m files from 30 seconds to 1 second". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.