In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the concept of Hadoop compression technology". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the concept of Hadoop compression technology.
1 Overview
Compression strategies and principles
2 compression encoding compression format supported by MR whether the file extension can be sliced and changed to a compressed format, whether the original program needs to modify DEFLATE Yes, use Gzip directly the same as text processing, do not need to modify Gzip Yes, directly use DEFLATE.gz is the same as text processing, do not need to modify bzip2 Yes, directly use bzip2.bz2 is the same as text processing, there is no need to modify DEFLATE.gz If you need to install LZO.lzo, you need to build an index. You also need to specify the input format Snappy No. Snappy.snappy needs to be installed. Snappy is the same as text processing and does not need to be modified.
To support a variety of compression / decompression algorithms, Hadoop introduces encoders / decoders, as shown in the following table.
Encoder / decoder DEFLATEorg.apache.hadoop.io.compress.DefaultCodecgziporg.apache.hadoop.io.compress.GzipCodecbzip2org.apache.hadoop.io.compress.BZip2CodecLZOcom.hadoop.compression.lzo.LzopCodecSnappyorg.apache.hadoop.io.compress.SnappyCodec corresponding to compression format
Comparison of compression performance
Compression algorithm original file size compression file size compression speed decompression speed gzip8.3GB1.8GB17.5MB/s58MB/sbzip28.3GB1.1GB2.4MB/s9.5MB/sLZO8.3GB2.9GB49.3MB/s74.6MB/s3 compression mode choose 3.1 Gzip compression
3.2 Bzip2 Compression
3.3 Lzo Compression
3.4 Snappy compression
4 compression position selection
5 compression parameter configuration parameter default value stage io.compression.codecs [in core-site.xml] org.apache.hadoop.io.compress.DefaultCodecorg apache.hadoop.io.compress.GzipCodec org.apache.hadoop.io.compress.BZip2Codec input compression mapreduce.map.output.compress [mapred-site.xml] falsemapper output mapreduce.map.output.compress.codec [mapred-site.xml] org.apache.hadoop.io.compress.DefaultCodecmapper output mapreduce.output.fileoutputformat.compress [mapred-site. Xml] falsereducer output mapreduce.output.fileoutputformat.compress.codec [mapred-site.xml] org.apache.hadoop.io.compress DefaultCodecreducer output mapreduce.output.fileoutputformat.compress.type [mapred-site.xml] RECORDreducer output 6 Compression Operation case 6.1 Compression and decompression of data streams package com.djm.mapreduce.zip Import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.compress.CompressionCodecFactory;import org.apache.hadoop.io.compress.CompressionInputStream;import org.apache.hadoop.io.compress.CompressionOutputStream;import org.apache.hadoop.util.ReflectionUtils;import java.io.* Public class CompressUtils {public static void main (String [] args) throws IOException, ClassNotFoundException {compress (args [0], args [1]); decompress (args [0]);} private static void decompress (String path) throws IOException {CompressionCodecFactory factory = new CompressionCodecFactory (new Configuration ()); CompressionCodec codec = (CompressionCodec) factory.getCodec (new Path (path)) If (codec = = null) {System.out.println ("cannot find codec for file" + path); return;} CompressionInputStream cis = codec.createInputStream (new FileInputStream (new File (path); FileOutputStream fos = new FileOutputStream (new File (path + ".decoded"); IOUtils.copyBytes (cis, fos, 1024); cis.close (); fos.close () } private static void compress (String path, String method) throws IOException, ClassNotFoundException {FileInputStream fis = new FileInputStream (new File (path)); Class codecClass = Class.forName (method); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance (codecClass, new Configuration ()); FileOutputStream fos = new FileOutputStream (path + codec.getDefaultExtension ()); CompressionOutputStream cos = codec.createOutputStream (fos); IOUtils.copyBytes (fis, cos, 1024) Cos.close (); fos.close (); fis.close ();}} 6.2 Map output using compressed package com.djm.mapreduce.wordcount;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.BZip2Codec;import org.apache.hadoop.io.compress.CompressionCodec Import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;public class WcDriver {public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration configuration = new Configuration (); configuration.setBoolean ("mapreduce.map.output.compress", true) / / set map output compression mode configuration.setClass ("mapreduce.map.output.compress.codec", BZip2Codec.class, CompressionCodec.class); Job job = Job.getInstance (configuration); job.setJarByClass (WcDriver.class); job.setMapperClass (WcMapper.class); job.setReducerClass (WcReduce.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (IntWritable.class) Job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); FileInputFormat.setInputPaths (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); boolean result = job.waitForCompletion (true); System.exit (result? 0: 1);} 6.3Reduce output uses compressed package com.djm.mapreduce.wordcount;import org.apache.hadoop.conf.Configuration Import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.BZip2Codec;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException Public class WcDriver {public static void main (String [] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration configuration = new Configuration (); Job job = Job.getInstance (configuration); job.setJarByClass (WcDriver.class); job.setMapperClass (WcMapper.class); job.setReducerClass (WcReduce.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (IntWritable.class); job.setOutputKeyClass (Text.class) Job.setOutputValueClass (IntWritable.class); FileInputFormat.setInputPaths (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); / / set reduce output compression to enable FileOutputFormat.setCompressOutput (job, true); / / set compression mode FileOutputFormat.setOutputCompressorClass (job, BZip2Codec.class); boolean result = job.waitForCompletion (true) System.exit (result? 0: 1);}} so far, I believe you have a deeper understanding of "the concept of Hadoop compression technology". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.