In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the data compression method of hive". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "the data compression method of hive".
1. Compression coding supported by MR
Compressed format
Tools
Arithmetic
File extension
Whether it can be sliced
DEFAULT
None
DEFAULT
.deflate
No
Gzip
Gzip
DEFAULT
.gz
No
Bzip2
Bzip2
Bzip2
.bz2
Yes
LZO
Lzop
LZO
.lzo
No
LZ4
None
LZ4
.lz4
No
Snappy
None
Snappy
.snappy
No
To support multiple compression / decompression algorithms, Hadoop introduces encoders / decoders, as shown in the following table
Compressed format
Corresponding encoder / decoder
DEFLATE
Org.apache.hadoop.io.compress.DefaultCodec
Gzip
Org.apache.hadoop.io.compress.GzipCodec
Bzip2
Org.apache.hadoop.io.compress.BZip2Codec
LZO
Com.hadoop.compression.lzo.LzopCodec
LZ4
Org.apache.hadoop.io.compress.Lz4Codec
Snappy
Org.apache.hadoop.io.compress.SnappyCodec
Comparison of compression performance
Compression algorithm
Original file size
Compress file size
Compression speed
Decompression speed
Gzip
8.3GB
1.8GB
17.5MB/s
58MB/s
Bzip2
8.3GB
1.1GB
2.4MB/s
9.5MB/s
LZO
8.3GB
2.9GB
49.3MB/s
74.6MB/s
Http://google.github.io/snappy/
On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
.2. Compress configuration parameters
To enable compression in Hadoop, configure the following parameters (in the mapred-site.xml file):
Parameters.
Default value
Stage
Suggestion
Io.compression.codecs
(configured in core-site.xml)
Org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec
Org.apache.hadoop.io.compress.Lz4Codec
Input compression
Hadoop uses a file extension to determine whether a codec is supported.
Mapreduce.map.output.compress
False
Mapper output
This parameter is set to true to enable compression
Mapreduce.map.output.compress.codec
Org.apache.hadoop.io.compress.DefaultCodec
Mapper output
Compress data at this stage using LZO, LZ4, or snappy codecs
Mapreduce.output.fileoutputformat.compress
False
Reducer output
This parameter is set to true to enable compression
Mapreduce.output.fileoutputformat.compress.codec
Org.apache.hadoop.io.compress. DefaultCodec
Reducer output
Use standard tools or codecs such as gzip and bzip2
Mapreduce.output.fileoutputformat.compress.type
RECORD
Reducer output
Types of compression used for SequenceFile output: NONE and BLOCK
3. Enable Map output phase compression
Enabling map output phase compression can reduce the amount of data transferred between map and Reduce task in job. The specific configuration is as follows:
Case practice:
1) enable hive intermediate transmission data compression function
Hive (default) > set hive.exec.compress.intermediate=true
2) enable map output compression in mapreduce
Hive (default) > set mapreduce.map.output.compress=true
3) set the compression mode of map output data in mapreduce
Hive (default) > set mapreduce.map.output.compress.codec= org.apache.hadoop.io.compress.SnappyCodec
4) execute the query statement
Select count (1) from score;4 turns on Reduce output phase compression
When Hive writes the output to a table, the output can also be compressed. The property hive.exec.compress.output controls this function. Users may need to keep the default value of false in the default settings file so that the default output is an uncompressed plain text file. Users can turn on the output compression function by setting this value to true in the query statement or execution script.
Case practice:
1) enable hive final output data compression function
Hive (default) > set hive.exec.compress.output=true
2) enable mapreduce final output data compression
Hive (default) > set mapreduce.output.fileoutputformat.compress=true
3) set the compression mode of mapreduce final data output
Hive (default) > set mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.SnappyCodec
4) set mapreduce final data output compression to block compression
Hive (default) > set mapreduce.output.fileoutputformat.compress.type=BLOCK
5) Test whether the output is a compressed file.
Insert overwrite local directory'/ export/servers/snappy' select * from score distribute by s_id sort by s_id desc; so far, I believe you have a deeper understanding of "hive's data compression method". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.