How to optimize Compression algorithm in Construction and deployment 04/11 Update SLTechnology News&Howtos

How to optimize Compression algorithm in Construction and deployment

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to optimize the compression algorithm in the construction and deployment", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to optimize the compression algorithm in construction and deployment.

Background

Generally speaking, the process of building and deploying the service publishing platform (except image deployment) will go through the steps of building (synchronizing code-> compiling-> packaging-> uploading), deploying (downloading package-> decompressing to target machine-> restarting service) and so on. Taking Meituan's internal publishing platform Plus as an example, we recently found that some release items take a long time in the process of building product packaging and compression. The pack step shown in the figure below took a total of 1 minute and 23 seconds.

When we usually answer operation and maintenance questions for users, we also find that many users are used to putting some large machine learning or NLP-related data into the warehouse, which often takes up hundreds of megabytes, or even several GB disk space, which greatly affects the speed of packaging. The same is true of Java projects. Due to the variety of Java service frameworks and dependencies, these services usually occupy 100 megabytes of space after packaging, and take up more than 10 seconds. The figure below shows the median of our pack steps. Basically, most Java services and Node.js services take at least 13 seconds to compress and package.

As a necessary step for almost all services that need to be deployed, pack currently takes only less time than compiling and building images, so in order to improve the efficiency of the overall build, we are going to do a round of optimization on the steps of pack packaging and compression.

Plan comparison and preparation of scene data

Package size analysis of release items

In order to simulate the application scenarios in the construction and deployment as much as possible, we collate and analyze some of the construction package data in 2020. The compressed package size is shown in the figure below, and the bell curve shows that the overall package volume is normally distributed. And it has an obvious long tail effect. The volume after compression is mainly less than 200m, and the size before compression is about less than 516.0MB.

99% of the service package size will be within 1GB, but for compression steps, the larger the project, the more time-consuming it is, and the more room for optimization. Therefore, in the targeted scheme comparison test, we choose the construction package of about 1GB for compression test, which can not only cover 99% of the scenarios, but also show a significant improvement between compression algorithms.

The main reasons for this choice are as follows:

In the case of large data, the error of the calculation result is much smaller than that of small data.

It can cover most application scenarios.

The effect is obvious, you can see whether there is a significant improvement.

Note: due to the same compression library and the same compression ratio configuration, there is no significant change in Compression Speed, so there is no batch test and data collection of other package volumes.

The test project we use in this paper is a large C++ project within Meituan, in which the file type is comprehensive except C++, Python, Shell code files, NLP, tools and other binary data (excluding the submission data stored in .git).

The directory size is 1.2G, which can also clearly compare the gap between different schemes.

Gzip

Gzip is a DEFLATE-based algorithm, which is a combination of LZ77 and Huffman coding. DEFLATE was intended to replace LZW and other patented data compression algorithms, which at that time limited the availability (Wikipedia) of compression and other popular archives.

We usually use the tar-czf command to package and compress, and the z parameter is compressed in the same way as gzip. DEFLATE standard (RFC1951) is a widely used lossless data compression standard. Its compressed data format consists of a series of blocks, corresponding to the blocks of input data, each block is compressed by LZ77 (based on dictionary compression, that is, the letters with the highest probability are represented by the shortest coding) algorithm and Huffman coding. LZ77 algorithm reduces the data volume by finding and replacing repeated strings.

file format

A 10-byte header containing a magic number (1f 8b), compression methods (such as 08 for DEFLATE), 1-byte header flags,4 byte timestamp, compression flags, and operating system ID.

Optional additional headers, including the original file name, comment field, "extra" field, and header's CRC-32 check code lower half.

DEFLATE compresses the body.

8-byte footer containing CRC-32 parity and raw uncompressed data.

We can see that gzip is mainly based on the DEFLATE algorithm of CRC and Huffman LZ77, which is also the optimization goal of the later ISA-L library.

Brotli

Alakuijala and Szabadka completed the Brotli specification in 2013-2016, a data format designed to further improve the compression ratio and has a wide range of applications in optimizing the speed of websites. Formal validation of the Brotli specification is implemented independently by Mark Adler. Brotli is a data format specification for data stream compression, which uses a specific combination of general LZ77 lossless compression algorithm, Huffman coding and second-order context modeling (2nd order context modelling). You can refer to this paper to see its implementation principle.

Because of the nature of the language, context-based modeling (Context Modeling) can get a better compression ratio, but it is difficult to popularize because of its performance problems. At present, there are two popular breakthrough algorithms:

ANS:Zstd, lzfse

Context Modeling:Brotli, bz2

See below for specific test data.

Zstd

Zstd, whose full name is Zstandard, is a fast compression algorithm that provides a high compression ratio. The main programming language is C, which was released by Facebook's Yann Collet in 2016. Zstd uses a finite state entropy (Finite State Entropy, abbreviated to FSE) encoder. This encoder is a new type of entropy encoder developed by Jarek Duda based on ANS theory, which provides a very powerful compromise between compression speed and compression ratio (in fact, it can have both fish and bear's paw). Zstd provides compression ratios close to lzma, lzham, and ppmx at its maximum compression level, and performs better than lza or bzip2. Zstandard achieves Pareto frontier (the ideal state for optimal resource allocation) because it decompresses faster than any other currently available algorithm, but compression is similar or better.

For small data, it also provides a way to load preset dictionaries to optimize speed, which can be generated by training the target data.

Comparison of official Benchmark data

The compression level can be specified by-- fast, which provides faster compression and decompression speed, resulting in some loss of compression ratio compared to level 1, as shown in the table above. Zstd can exchange compression speed for a stronger compression ratio. It is a configurable small increment, and the decompression speed remains the same under all settings, which is a feature shared by most LZ compression algorithms, such as zlib or lzma.

We tested using the default parameters of Zstd, and the compression time of 8.471s was only 11.266% of the original, an increase of 88.733%.

The decompression time 3.211 is only 29.83% of the original, and the increase is about 70.169%.

At the same time, the compression ratio has increased from 2.548 to 2.621.

LZ4

LZ4 is a lossless compression algorithm that provides a compression speed of more than 500 MB/s per core (greater than 0.15 Bytes/cycle). Its characteristic is that the decoding speed is very fast, and the speed per core is more than GB/s (about 1 Bytes/cycle).

From the Benchmark comparison of Zstd above, we can see that the effect of LZ4 algorithm is very outstanding, so we also compared LZ4. LZ4 pays more attention to the speed of compression and decompression, especially the speed of decompression, and the compression ratio is not its strong point. It supports 1-9 compression parameters by default, which we tested respectively.

The compression speed of LZ4 using default parameters is excellent, which is much faster than that of Zstd, but the compression ratio is not high. It is 206 MB more than that of Zstd compression, a full 46% more, which means more data transfer time and disk space consumption. Even the maximum compression ratio is not high, only from 1.79 to 2.11, but the time-consuming increases from 5s to 51s. By comparison, LZ4 is indeed not the best solution in terms of compression ratio, and there is basically no time advantage in 2.x compression ratio, and there is another point, that is, LZ4 does not officially support multi-core CPU parallel compression, so in the follow-up comparison, LZ4 lost the advantage of compression and decompression speed.

Pigz

Mark Adler, author of Pigz, is co-author of zip and unzip of Info-ZIP, gzip and zlib compression libraries of GNU, and is a participant in the development of PNG image format.

Pigz is an acronym for parallel implementation of gzip. Its main idea is to use multiple processors and cores. It divides the input into 128 KB blocks, each of which is compressed in parallel. A single check value for each block is also calculated in parallel. Its implementation uses zlib and pthread libraries directly, is easy to read, and is important to be compatible with gzip formats. Pigz uses one thread (the main thread) for decompression, but you can create three more threads to read, write, and check calculations, so decompression can be accelerated in some cases.

Some bloggers do not have a very high speed when testing Pigz compression performance on home PC platforms such as i7 4790K, but they improve significantly in the data verified by our real machine. Through the test, its compression time execution speed is only 1.768s, which gives full play to the performance of our platform physical machine. User time (the sum of CPU time) uses 1m30.569s, which is almost the same level as the previous way of using gzip single thread. The compression ratio of 2.5488 is almost the same as the normal use of tar-czf.

ISA-L Acceleration Version

ISA-L is an open source library that accelerates the execution of algorithms on the IA architecture and aims to address the computing needs of storage systems. ISA-L uses BSD-3-Clause License, so it can also be used commercially.

You should also have heard of ISA-L using SPDK (Storage Performance Development Kit) or DPDK (Data Plane Development Kit), which uses the CRC portion of ISA-L, while the latter uses its compression optimization. ISA-L uses efficient SIMD (Single Instruction, Multiple Data) instructions and special instructions to maximize the use of CPU's micro-architecture to speed up the computing process of storage algorithms. The underlying functions of ISA-L are written in manual assembly code and tuned in many details (now you often have to consider the ARM platform, where some of the instructions mentioned in this article are not or even not supported).

ISA-L mainly optimizes the compression algorithms of CRC, DEFLATE and Huffman. Official data show that ISA-L is five times faster than zlib-1.

For example, many underlying storage open source software to achieve the CRC are using the look-up table method, the code to store or generate a CRC value table, and then calculate the value queried in the process, ISA-L assembly code contains no carry multiplication instruction PCLMULQDQ to two 64 digits to do no carry multiplication, maximize the throughput of intel PCLMULQDQ instructions to optimize the performance of CRC. Even better, if CPU supports AVX-512, you can use other optimization instruction sets such as VPCLMULQDQ (PCLMULQDQ is implemented in the 512 bit version of EVEX coding) (see Appendix for ways to see if it supports it).

Note: screenshot from crc32_ieee_by16_10.asm

Use

The compression optimization level implemented by ISA-L supports [0jue 3]. 3 is the version with the largest compression ratio. Considering that we have adopted the maximum compression ratio, although the compression ratio of 2.346 is slightly lower than that of gzip, it has little effect. In the v2.27 version released in June 2019, ISA-L added multithreaded Feature, so in subsequent tests, we used the parameter of multithreading concurrency, and the effect was significantly improved.

Since the system we built the machine is dependent on compiling ISA-L itself for Centos7, such as nasm, the installation and configuration will be complicated. ISA-L supports a variety of optimization parameters, such as IGZIP_HIST_SIZE (increase the sliding window during compression), LONGER_HUFFTABLES, and a larger Huffman coding table. These compilation parameters will also greatly improve the library.

Compression time

Real 0m1.372suser 0m5.512ssys 0m1.791s

The effect after the test is quite amazing, which is the fastest in the comparison scheme at present, and saves 98.09% in time.

Because it is compatible with gzip format, it can also be decompressed using the tar-xf command. In the subsequent decompression test, we still use the decompression method provided by ISA-L.

Pzstd

Through the test of Pigz, we wondered whether an excellent algorithm like Zstd could also support parallelism. In the official Repo, we were pleasantly surprised to find a "treasure".

Pzstd is a parallel version of Zstandard implemented by Clover 11 (Zstd has also added multithreading support since then), similar to Pigz's tool. It provides compression and decompression capabilities compatible with Zstandard format and can take advantage of multiple CPU cores. It divides the input into blocks of equal size and compresses each block into Zstandard frames independently. These frames are then connected together to produce a final compressed output. Pzstd also supports parallel decompression of files. When unzipping files compressed with Zstandard, PZstandard executes IO in one thread and decompresses it in another thread.

The following figure shows a comparison with the compression and decompression speed of Pigz. From the official Github repository (machine configuration: Intel Core i7 @ 3.1 GHz, 4 threads), the effect is even better than that of Pigz. For more information, please see the following comparison data:

Pied Piper (Middle-out compression)

Middle-out Compression is originally a concept mentioned in American TV series, but now there is a real implementation of middle-out. At present, the algorithm is used to compress time series data on a small scale. Due to the lack of mature theoretical support and data comparison, it has not formally entered the alignment of the scheme.

Note: Pied Piper is a virtual company and algorithm in the American TV series Silicon Valley.

Compatibility

The comparison schemes mentioned in this article are all tested uniformly in the machines in the construction cluster, because the configuration of the machine cluster on the construction system is highly unified (including hardware platform and system version). So the compatibility performance is also highly consistent with the test data.

Type selection

In fact, the configuration of some official test machines is not consistent with the physical machine configuration of Meituan's construction platform, and the scenarios may also be different. The CPU and platform architecture and compilation environment used in the test results cited above are also different from those we use, and most of them are household hardware, such as i7-4790K or i9-9900K. This is why we need to use the physical machine of the build platform and the specific build package compression scenario to test, because only in this way can we get the data that is closest to our use scenario.

Comparative data

The data of several scenarios are compared with the following table (the time data selection in this article is the median of the results after multiple runs):

Compression time comparison

As can be seen from the time visualization diagram of the compressed build package after construction, the initial version of gzip compression is quite time-consuming, while Pzstd is the fastest solution, ISA-L is slightly slower, Pigz is slightly slower, all of these three can achieve optimization from 1m11s to about 1s, and the fastest compression time can be saved by 98.51%.

Decompression time comparison

The decompression time is not as different as the compression efficiency. When the compression ratio is in the range of 2.5-2.6 in GB-level projects, the time is less than 11s. Moreover, in order to maximize compatibility with existing implementations and maintain stability, the decompression scheme gives priority to the strategy of compatibility with gzip format, so that it is least intrusive to the deployment target machine, that is, the solution that can be decompressed with tar-xf is preferred.

In the later implementation of the scheme, because the deployment requires a stable and reliable environment, we have not done any environmental modification to the deployment machine for the time being.

The following time comparison is the comparison of using their respective decompression schemes:

Pzstd decompresses the fastest, saving 86.241% of the time compared to Gzip.

The decompression efficiency of Zstd algorithm is the second, which can save about 70.169% of the decompression time.

ISA-L can save 61.9658% of the time.

Pigz can save 43.357% of decompression time.

Decompression of Brotli can save 29.02% of the time.

Comparison of compression ratio

In the comparison of compression ratio, Zstd and Pzstd have some advantages, among which Brotli and LZ4 are difficult to test the speed under the same compression ratio because of the limitation of supported parameters, so they choose the parameters with lower compression ratio, but the efficiency still lags behind that of Pigz and Pzstd.

The implementation of ISA-L has some sacrifices in compression ratio, but there is not a big difference.

Analysis and summary of advantages and disadvantages

At the beginning of this article, we introduced the build deployment process, so our target time for this optimization is roughly calculated as follows:

In the comparison of test cases, the time-consuming order is Pzstd < ISA-L < Pigz < LZ4 < Zstd < Brotli < Gzip (the higher the ranking, the better). The compression and decompression time accounts for a large proportion of the overall time-consuming, so the alternative strategies are Pzstd, ISA-L and Pigz.

Of course, there is also the problem of cost optimization of disk space, that is, the compression ratio may optimize disk space, but it will lead to negative optimization of construction time. However, since the time dimension is the main goal of our optimization, it is more important than disk cost and upload bandwidth cost, so the compression ratio adopts a more common or default optimal compression ratio scheme, that is, in the range of 2.3-2.6. However, in some scenarios where the cost of storage media such as memory databases is relatively high, more considerations may be needed to integrate many aspects, please know.

Scoring strategy

Compared with gzip, the new algorithm has a good speed, but due to the forward incompatibility of the compression format and the support of the client (deployment target machine), the implementation cycle of the solution may be longer.

For deployment, the possible benefits are not very obvious, but increase some maintenance and operation costs, so we do not give Pzstd's solution a higher priority for the time being.

The selection strategy is mainly based on the following principles:

The improvement of overall time-consuming optimization is the largest, which is also the starting point of the overall optimization scheme.

To ensure the maximum compatibility, in order to reduce the change cost of the business and platform connected to the construction platform, it is necessary to maintain the compatibility of the solution (priority should be given to the strategy of maximum compatibility, that is, the solution that is compatible with gzip is preferred).

To ensure the stability and reliability of the deployment target machine environment, choose the scheme that minimizes intrusion into the deployment machine, so that there is no need to install clients or libraries.

In the real machine simulation test, the compression scene is fully consistent with the scene of Meituan's construction platform, that is, the effect of comparing data is good in our existing physical machine platform and target compression scene.

In fact, the more comprehensive scoring angle of this question has many dimensions, such as disk cost of object storage, bandwidth cost, task time, and even machine cost. However, in order to simplify the selection of the overall solution, we have omitted some calculations. At the same time, the comparative selection of compression ratio also chose the range of their official recommendations.

To sum up the above points, it is decided that the acceleration of ISA-L in the first phase can improve the efficiency of building the platform most stably and at a relatively high speed. The scheme of Pzstd may be implemented in the future. The following data are the results of the first phase.

Optimization effect

In order to facilitate the display of the results, we have filtered out some release items with a long packaging time (these time-consuming projects often have a great impact on the user experience, and the overall proportion is about 10%). The optimization time of this part of the task ranges from 27s to 72s. In the past, the longer the compression time of the larger the project, now the compression time can be stable within 10s. And when our machines perform multiple tasks at the same time.

Then we summarize the Pack step (compression + upload) part of the management data before and after optimization, and get the average optimization effect comparison. The data are as follows:

In our previous build package statistics, most of the build packages are compressed around 100MB, and before compression is about 250MB. According to the gzip algorithm, the compression speed is indeed about 10s.

Because the built physical machine may run multiple tasks at the same time, the actual compression effect will take a little more time than in the test.

Compression saves an average of 90% of the time.

At this point, I believe you have a deeper understanding of "how to optimize the compression algorithm in the construction and deployment". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.