In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you how to compress Codec in Hadoop, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
As input
When the compressed file is used as input to MapReduce, MapReduce automatically finds the appropriate Codec through the extension to extract it.
As output
When the output file of MapReduce needs to be compressed, you can change mapred.output.compress to true,mapred.output.compression.codec to the class name of the codec you want to use. Of course, you can specify in the code to set these two properties by calling the static method of FileOutputFormt:
Package com.hadoop.codecs;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.GzipCodec;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException Public class CodecDemo {public static void main (String [] args) throws Exception {if (args.roomthread2) {System.exit (- 1);} Job job=new Job (); job.setJarByClass (CodecDemo.class); job.setJobName ("CodecDemo"); FileInputFormat.addInputPath (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])) Job.setMapperClass (MyMapper.class); job.setCombinerClass (MyReducer.class); job.setReducerClass (MyReducer.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (LongWritable.class); / / set output compression to enable FileOutputFormat.setCompressOutput (job, true); / / set compression class: GzipCodec FileOutputFormat.setOutputCompressorClass (job, GzipCodec.class) System.exit (job.waitForCompletion (true)? 0:1);}}
Use CompressionCodes to extract
/ * there are two ways to decompress and decompress CompressionCodec using CompressionCodes. Compression: get the CompressionOutputStream object decompression through the createOutputStream (OutputStream out) method: obtain the CompressionInputStream object through the createInputStream (InputStream in) method to accept the parameters of a CompressionCodec implementation class from the command line, then instantiate the class through ReflectionUtils, call the interface method of CompressionCodec to encapsulate the standard output stream, encapsulate it into a compressed stream, copy the standard input stream to the compressed stream through the copyBytes method of the IOUtils class, and finally call the finish method of CompressionCodec to complete the compression. * / package com.hadoop.codecs;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.compress.CompressionOutputStream;import org.apache.hadoop.util.ReflectionUtils;public class Compressors {public static void main (String [] args) throws Exception {String codecClassName = args [0]; Class codecClass = Class.forName (codecClassName); Configuration conf = new Configuration () CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance (codecClass, conf); CompressionOutputStream out = codec.createOutputStream (System.out); IOUtils.copyBytes (System.in, out, 4096, false); out.finish ();}}
Use CompressionCodecFactory to extract
/ * if you want to read a compressed file, you must first determine which codec should be used by the extension. Of course, there is an easier way. CompressionCodecFactory has done this for you. You can get the corresponding codec by passing in a Path and calling its getCodec method. Notice the removeSuffix method, which is a static method that removes the suffix from the file, and then we use this path as the unzipped output path. The number of codec that CompressionCodecFactory can find is also limited. By default, there are only three kinds of org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec. If you want to add other codec, you need to change the io.compression.codecs property and register codec. * / package com.hadoop.codecs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.compress.CompressionCodec; import org.apache.hadoop.io.compress.CompressionCodecFactory; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.net.URI Public class FileDecompressor {public static void main (String [] args) throws Exception {String uri = args [0]; Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (URI.create (uri), conf); Path inputPath = new Path (uri); CompressionCodecFactory factory = new CompressionCodecFactory (conf) CompressionCodec codec = factory.getCodec (inputPath); if (codec = = null) {System.out.println ("No codec found:" + uri); System.exit (1);} String outputUri = CompressionCodecFactory.removeSuffix (uri, codec.getDefaultExtension ()) InputStream in = null; OutputStream out = null; try {in = codec.createInputStream (fs.open (inputPath)); out = fs.create (new Path (outputUri)); IOUtils.copyBytes (in,out,conf) } finally {IOUtils.closeStream (in); IOUtils.closeStream (out);} these are all the contents of the article "how to compress Codec in Hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.