Linux basic command-bzip2 07/13 Update SLTechnology News&Howtos

Linux basic command-bzip2

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Bzip2

Use the Burrows-Wheeler block sort text compression algorithm to compress the file, and the compression ratio is higher than the general algorithm. Bzip2 requires that the command line flag be accompanied by a list of file names. Each file is replaced by its own compressed version, named "original_name.bz2". Each compressed file has the same modification date, permissions, and ownership (if possible) as the corresponding original file, so these properties can be correctly restored when unzipped.

By default, bzip2 and bunzip2 do not overwrite existing files. If you want this to happen, specify the "- f" flag. If no filename is specified, bzip2 is compressed from standard input to standard output. In this case, bzip2 refuses to write compressed output to the terminal because it would be completely incomprehensible and therefore pointless.

Bunzip2 (or bzip2-d) unzips all specified files. Files not created by bzip2 are detected and ignored, and a warning is issued. Bzip2 attempts to guess the file name of the unzipped file from the compressed file, as follows:

Filename.bz2 becomes filename

Filename.bz becomes filename

Filename.tbz2 becomes filename.tar

Filename.tbz becomes filename.tar

Anyothername becomes anyothername.out

If the file does not end in .bz2, .bz, .tbz2, or .tbz, bzip2 complains that it cannot guess the name of the original file and uses the original name and appends .out. As with compression, not providing a file name results in decompression from standard input to standard output. Bunzip2 correctly unzips a file that is a cascade of two or more compressed files. The result is to connect the corresponding uncompressed files. The integrity test (- t) of connecting compressed files is also supported.

Files can also be compressed or unzipped to standard output by giving the "- c" flag. Multiple files can be compressed and decompressed like this. The resulting output is sequentially input to stdout. Compressing multiple files in this way produces a stream that contains multiple compressed file representations. Such a stream can only be decompressed correctly with bzip2 version 0.9.0 or later. After unzipping the first file in the stream, earlier versions of bzip2 will stop.

Bzcat (or bzip2-dc) decompresses all specified files to standard output. Bzip2 will read the parameters from the environment variables BZIP2 and BZIP in this order and process them before reading any parameters from the command line. This provides a convenient way to provide default parameters.

Compression is always performed, even if the compressed file is slightly larger than the original file. Files less than 100 bytes tend to get larger because the compression mechanism has a constant overhead in the 50-byte range. Random data, including the output of most file compressors, is encoded at about 8.05 bits per byte, with an expansion of about 0.5%.

Bzip2 uses 32-bit crc to ensure that the extracted version of the file is the same as the original file. This prevents corruption of compressed data and bug that is not detected in the bzip2 (hopefully highly unlikely). The possibility of data corruption is negligible, with about one opportunity for every 4 billion files processed. Note, however, that the check occurs during decompression, so it can only tell you that something is wrong. It cannot help you recover the original uncompressed data. You can use bzip2recover to try to recover data from corrupted files.

Return value: 0 indicates normal exit, 1 indicates environmental problem (file cannot be found, invalid flag, I _ Unip O error, & c), 2 indicates corrupted compressed file, 3 indicates internal consistency error (for example, bug), causing bzip2 panic.

The scope of this command: RedHat, RHEL, Ubuntu, CentOS, SUSE, openSUSE, Fedora.

1. Grammar

Bzip2 [- cdfkqstvzVL123456789] [filenames...]

2. List of options

Option

Description

-h |-- help

Help information

-V |-- version

Display command version information

-c |-- stdout

Write unzipped or decompressed files to standard output

-d |-- decompress

Decompression

-- z |-compress

Supplement to-d: force compression regardless of the name of the call

-t |-- test

Check the integrity of the specified files, but do not extract them. This does perform an attempt to decompress and discard the result.

-f |-- force

Enforcement

-k |-- keep

Keep the source file after decompression

-s |-- small

Reduce memory usage for compression, decompression, and testing. The file is decompressed and tested using a modified algorithm, which requires only 2.5 bytes per block byte. This means that any file can be unzipped in 2300 k of memory, albeit at half the normal speed.

During compression,-s chooses a block size of 200 k, which limits memory usage to about the same number at the cost of compression ratio. In short, if your machine runs out of memory (8 megabytes or less), use-s to do everything. See memory management below.

-L |-- license |-V |-- version

List the licenses for gzip

-Q |-- quite

Skip all warning messages

-v |-- verbose

Show detailed execution process

-1 ~-9

Specify the compression ratio and set the block size to 100kmagnetic 200kpjm 900K. Decompression is not valid. "- 1" is equivalent to "--fast" and "- 9" is equivalent to "--best".

-v |-- verbose

Show execution process

Treat all subsequent parameters as file names, even if they start with'-'. This allows you to work with files that start with'-', such as "bzip2-myfilename".

-- repetitive-fast

-- repetitive-best

These flags are superfluous in version 0.9.5 and above. They provide some rough control over the behavior of sorting algorithms in earlier versions, which is sometimes useful. The improved algorithm of version 0.9.5 and above makes these flags irrelevant

3. Memory management

Bzip2 compresses large files in blocks. Block size affects both the compression ratio and the amount of memory required for compression and decompression. Flags-1 through-9 specify block sizes of 100000 to 900000 bytes (default). When unzipping, read the block size used for compression from the header of the compressed file, and then allocate enough memory to extract the file. Because the block size is stored in the compressed file, the flags-1 to-9 are ignored because they are irrelevant during the decompression process. The compression and decompression requirements (in bytes) can be estimated as

Compression: 400k + (8 x block size)

Decompression: 100k + (4 x block size), or 100k + (2.5x block size)

Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred kilos of block size, which you should keep in mind when using bzip 2 on small machines. It is also important to realize that the unzipped memory requirements are set at compression time by selecting the block size.

For files compressed with the default 900k block size, bunzip2 requires approximately 3700 kilobytes to decompress. To support the extraction of any file on the 4MB machine, bunzip2 can choose to use about half of the memory (about 2300 kilobytes) to extract it. The decompression speed is also halved, so you should use this option only if necessary. The related sign is-s. In general, try and use memory constraints with the maximum block size allowed, because this maximizes compression. The compression and decompression speed is almost unaffected by the block size.

Another point applies to files that fit a single block, meaning that most files use large chunks. The actual amount of memory exposed is proportional to the file size because the file is smaller than the block. For example, using the flag-9 to compress a file with a length of 20000 bytes will cause the compressor to allocate about 7600k of memory, but will only touch 400kb / 20000mm / 8 = 560000 bytes of memory. Similarly, the decompressor allocates 3700k, but only touches 100k+20000*4=180 k bytes.

Here is a table that summarizes the maximum memory usage for different block sizes

Compress Decompress Decompress Corpus

Flag usage usage-s usage Size

-1 1200k 500k 350k 914704

-2 2000k 900k 600k 877703

-3 2800k 1300k 850k 860338

-4 3600k 1700k 1100k 846899

-5 4400k 2100k 1350k 845160

-6 5200k 2500k 1600k 838626

-7 6100k 2900k 1850k 834096

-8 6800k 3300k 2100k 828642

-9 7600k 3700k 2350k 828642

4. Recover data from corrupted files

Bzip2 compresses files in blocks, usually up to 900 kilobytes. Each block is processed independently. If a media or transfer error causes a multi-block .bz2 file to be corrupted, you can recover the data from an uncorrupted block in the file. The compressed representation of each block is separated by a 48-bit pattern, which makes it possible to find the block boundary with reasonable certainty. Each block also comes with its own 32-bit CRC, so damaged blocks can be distinguished from undamaged ones.

Bzip2Recovery is a simple program that searches for blocks in ".bz2" files and writes each block to its own ".bz2" file. You can then use "bzip2-t" to test the integrity of the result file and extract the uncorrupted file.

Bzip2Recovery takes a parameter, the name of the damaged file, and writes files such as "rec00001file.bz2", "rec00002file.bz2", and so on, containing the extracted block. The output file name is designed to use wildcards in subsequent processing. For example, "bzip2-dc rec*file.bz2 > recovered_data" processes files in the correct order.

Bzip2 recovery should be most useful when working with large bz2 files because the files will contain many blocks. It is obviously futile to use it on a corrupted single-block file, because the damaged block cannot be recovered. If you want to minimize any potential data loss through media or transmission errors, consider using a smaller block size for compression.

5. Implementation description

The compressed sort phase collects similar strings in the file. Because of this, files that contain a large number of duplicate symbols, such as "aabaabaabaab." (repeat hundreds of times) may be slower than normal compression speed. In this regard, versions 0.9.5 and above are much better than previous versions. The ratio of the worst compression time to the average compression time is about 10:1. For previous versions, this number was more like 100Rom 1. If necessary, you can use the-vvvv option to monitor progress in great detail. The speed of decompression is not affected by these phenomena.

Bzip 2 usually allocates a few megabytes of memory to operate and then recharges it in a fairly random manner. This means that the performance of compression and decompression depends largely on the speed at which your machine can service cache loss. Because of this, significant performance improvements are provided disproportionately by making small changes to the code to reduce the leak rate. I think bzip 2 works best on machines with very large caches.

7. Examples

1) compress the file

[root@localhost weijie] # bzip2 1.c / / Compression 1.c, the source file will be deleted

[root@localhost weijie] # ls

11.c 1.c.bz2 2.c 3.c 4.c 5.c 6.c ~ bak

[root@localhost weijie] # bzip2-c 2.c > res.bz2 / / compress 1.c to res without moving the source file

[root@localhost weijie] # ls

11.c 1.c.bz2 2.c 3.c 4.c 5.c 6.c ~ bak res.bz2

2) decompression

[root@localhost weijie] # bzip2-d res.bz2 / / decompress

[root@localhost weijie] # ls

11.c 1.c.bz2 2.c 3.c 4.c 5.c 6.c ~ bak res

3) compress two files into one file

[root@localhost weijie] # cat 1.c 2.c / / output the contents of two files

Hello world

I am david.

I love linux

Love code.

one hundred and twenty three

twenty-three

two hundred and twelve

[root@localhost weijie] # bzip2-c 1.c > foo.gz / / compress 1.c to foo

[root@localhost weijie] # bzip2-c 2.c > > foo.gz / / compress 2.c to foo

[root@localhost weijie] # b gzip2-d foo.gz / / extract foo

[root@localhost weijie] # cat foo / / display the contents of foo

Hello world

I am david.

I love linux

Love code.

one hundred and twenty three

twenty-three

two hundred and twelve

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.