In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to configure compression in hadoop and hbase". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to configure compression in hadoop and hbase.
The commonly used compression algorithms in Hadoop are bzip2, gzip, lzo and snappy, among which lzo and snappy need the operating system to install native library to support them.
The following table, is a bit more official statistics, different occasions with different compression algorithms. Bzip2 and GZIP consume more CPU, the compression ratio is the highest, and GZIP cannot be processed in parallel; Snappy is similar to LZO, slightly better, and cpu consumes less than GZIP.
In general, it is more common to use Snappy and lzo if you want to strike a balance between CPU and IO.
Comparison between compression algorithms
Algorithm% remainingEncodingDecodingGZIP13.4%21 MB/s118 MB/sLZO20.5%135 MB/s410 MB/sSnappy22.2%172 MB/s409 MB/s
For files with TextFile,Sequence data format and other user-defined file formats, the above compression algorithm can be used for compression.
TextFile after compression, can not split, compressed data as the input of job is a file as a map. SequenceFile itself is divided into blocks, coupled with the compression format of lzo, the file can achieve lzo split operation, can be compressed according to record, block, generally using block is more efficient.
1. Hadoop (hive) sets mapreduce compression parameters
1. Compression support of the intermediate result of mapreduce
Method 1:
Mapred-site.xml in hadoop
Mapred.compress.map.output
True
Mapred.map.output.compression.codec
Com.hadoop.compression.lzo.LzoCodec
Method two
Hive-site.xml in hive
Hive.exec.compress.intermediate
True
Should the outputs of the maps be compressed before being
Sent across the network. Uses SequenceFile compression.
Hive.intermediate.compression.codec
Org.apache.hadoop.io.compress.LzoCodec
If the map outputs are compressed, how should they be
Compressed?
Method three
Shell in hive
Set hive.exec.compress.intermediate=true
Set hive.intermediate.compression.codec= "org.apache.hadoop.io.compress.LzoCodec"
2. Compression support of the output of mapreduce
Configuration in hive-site.xml:
Hive.exec.compress.output
True
Should the job outputs be compressed?
Mapred.output.compression.codec
Org.apache.hadoop.io.compress.LzoCodec
If the job outputs are compressed, how should they be compressed?
Or add to the hadoop-site.xml:
Io.compression.codecs
Org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.LzoCodec
A list of the compression codec classes that can be used
For compression/decompression.
Mapred.output.compress
True
Should the job outputs be compressed?
Mapred.output.compression.codec
Org.apache.hadoop.io.compress.LzoCodec
If the job outputs are compressed, how should they be compressed?
2. HBASE's support for these three compression formats
HFile can be compressed and stored by gzip, lzo and snappy in HBase.
1. Support for gzip compression
Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'
COMPRESSION = > 'GZ'}
Or alter 'testtable', but first disable table, after compression, and then enable table.
2. For lzo support, the system needs to install lzo dynamic library and hadoop lzo-related native library, and then copy the native library jar file to hadoop/lib/native and hbase/lib/native
Also in core-site.xml, configure lzo compression
Io.compression.codecs
Org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
Io.compression.codec.lzo.class
Com.hadoop.compression.lzo.LzoCodec
Org.apache.hadoop.io.compress.DefaultCodec is the default zlib compression for hadoop
Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'
COMPRESSION = > 'lzo'}
3. For the support of synappy, you need to install snappy, copy the dynamic and static link library files in the native of hadoop-snappy-0.0.1-SNAPSHOT.tar.gz to the native of hadoop and hbase lib, and test the hadoop-snappy-0.0.1-SNAPSHOT.jar under the lib of hadoop and hbase.
In core-site.xml, configure lzo compression
Io.compression.codecs
Org.apache.hadoop.io.compress.SnappyCodec
Hbase (main): 001testtable', 0 > create 'testtable', {NAME = >' colfam1'
COMPRESSION = > 'synappy'}
Thank you for reading, the above is the content of "how to configure compression in hadoop and hbase". After the study of this article, I believe you have a deeper understanding of how to configure compression in hadoop and hbase, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.