How to configure compression in hadoop and hbase 10/27 Update SLTechnology News&Howtos

How to configure compression in hadoop and hbase

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to configure compression in hadoop and hbase". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to configure compression in hadoop and hbase.

The commonly used compression algorithms in Hadoop are bzip2, gzip, lzo and snappy, among which lzo and snappy need the operating system to install native library to support them.

The following table, is a bit more official statistics, different occasions with different compression algorithms. Bzip2 and GZIP consume more CPU, the compression ratio is the highest, and GZIP cannot be processed in parallel; Snappy is similar to LZO, slightly better, and cpu consumes less than GZIP.

In general, it is more common to use Snappy and lzo if you want to strike a balance between CPU and IO.

Comparison between compression algorithms

Algorithm% remainingEncodingDecodingGZIP13.4%21 MB/s118 MB/sLZO20.5%135 MB/s410 MB/sSnappy22.2%172 MB/s409 MB/s

For files with TextFile,Sequence data format and other user-defined file formats, the above compression algorithm can be used for compression.

TextFile after compression, can not split, compressed data as the input of job is a file as a map. SequenceFile itself is divided into blocks, coupled with the compression format of lzo, the file can achieve lzo split operation, can be compressed according to record, block, generally using block is more efficient.

1. Hadoop (hive) sets mapreduce compression parameters

1. Compression support of the intermediate result of mapreduce

Method 1:

Mapred-site.xml in hadoop

Mapred.compress.map.output

True

Mapred.map.output.compression.codec

Com.hadoop.compression.lzo.LzoCodec

Method two

Hive-site.xml in hive

Hive.exec.compress.intermediate

True

Should the outputs of the maps be compressed before being

Sent across the network. Uses SequenceFile compression.

Hive.intermediate.compression.codec

Org.apache.hadoop.io.compress.LzoCodec

If the map outputs are compressed, how should they be

Compressed?

Method three

Shell in hive

Set hive.exec.compress.intermediate=true

Set hive.intermediate.compression.codec= "org.apache.hadoop.io.compress.LzoCodec"

2. Compression support of the output of mapreduce

Configuration in hive-site.xml:

Hive.exec.compress.output

True

Should the job outputs be compressed?

Mapred.output.compression.codec

Org.apache.hadoop.io.compress.LzoCodec

If the job outputs are compressed, how should they be compressed?

Or add to the hadoop-site.xml:

Io.compression.codecs

Org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.LzoCodec

A list of the compression codec classes that can be used

For compression/decompression.

Mapred.output.compress

True

Should the job outputs be compressed?

Mapred.output.compression.codec

Org.apache.hadoop.io.compress.LzoCodec

If the job outputs are compressed, how should they be compressed?

2. HBASE's support for these three compression formats

HFile can be compressed and stored by gzip, lzo and snappy in HBase.

1. Support for gzip compression

Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'

COMPRESSION = > 'GZ'}

Or alter 'testtable', but first disable table, after compression, and then enable table.

2. For lzo support, the system needs to install lzo dynamic library and hadoop lzo-related native library, and then copy the native library jar file to hadoop/lib/native and hbase/lib/native

Also in core-site.xml, configure lzo compression

Io.compression.codecs

Org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec

Io.compression.codec.lzo.class

Com.hadoop.compression.lzo.LzoCodec

Org.apache.hadoop.io.compress.DefaultCodec is the default zlib compression for hadoop

Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'

COMPRESSION = > 'lzo'}

3. For the support of synappy, you need to install snappy, copy the dynamic and static link library files in the native of hadoop-snappy-0.0.1-SNAPSHOT.tar.gz to the native of hadoop and hbase lib, and test the hadoop-snappy-0.0.1-SNAPSHOT.jar under the lib of hadoop and hbase.

In core-site.xml, configure lzo compression

Io.compression.codecs

Org.apache.hadoop.io.compress.SnappyCodec

Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'

COMPRESSION = > 'synappy'}

Thank you for reading, the above is the content of "how to configure compression in hadoop and hbase". After the study of this article, I believe you have a deeper understanding of how to configure compression in hadoop and hbase, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.