How to use Combiner 07/02 Update SLTechnology News&Howtos

How to use Combiner

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use Combiner". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1, the use of Combiner ~ merge 1, the working location of combiner: when kv overflows from the buffer to disk, you can use combiner (as long as set, unconditional use) after all data of each MapTask is written from the buffer to disk, you can use combiner (meet the conditions for use, the number of overwrites > = 3) 2, Combiner: the purpose of merging is to merge the output kv in advance in each MapTask. It can reduce the number of kv pairs transmitted from map to reduce and the amount of data finally processed by reduce. 3. Restrictions on the use of Combiner: you can only use combiner. For example, when calculating the average, it is not appropriate to use 4. The parent class of the Combiner component is that Reducer Combiner runs on the node where each MapTask is located, and Reducer receives the output of all global Mappei. 1. Custom Compiner class / * combiner function: * when mapTask is overwritten, the data output from each mapTask is locally summarized in advance to reduce the overall amount of data written into reduceTask * Note: custom Combiner class belongs to the MapTask phase (although it inherits Reducer) * / public class WordCountCombiner extends Reducer {int count = 0 @ Override protected void reduce (Text key, Iterable values, Context context) throws Exception {for (IntWritable value: values) {count+=value.get ();} context.write (key,new IntWritable (count));}} 2, WordCountMapperpublic class WordCountMapper extends Mapper {private Text outk = new Text (); private IntWritable outv = new IntWritable (1) @ Override protected void map (LongWritable key,Text value,Context context) throws Exception {/ / gets the input line of data String lineData = value.toString (); / / pre-analysis knows that each word String [] splitData = lineData.split ("") is obtained by cutting according to the space. / / traversing the data, write the cut data to for (String str: splitData) {/ / Note: the data type obtained here is String, which needs to be converted to Text outk.set (str); context.write (outk,outv);} 3, WordCountReducepublic class WordCountReduce extends Reducer {private IntWritable outv = new IntWritable () @ Override protected void reduce (Text key, Iterable values, Context context) throws Exception {/ / defines a variable to receive the total number of times in the traversal int count = 0 / / read values directly to get the number of occurrences of each word recorded in the iterator object for (IntWritable value: values) {/ / because the resulting value object is an IntWritable object and cannot be added directly, so it should be converted to int count + = value.get () / / get () method is converted to int} / / write out the calculated data, convert the count type to outv.set (count); context.write (key,outv);}} 4, WordCountDriverpublic class WordCountDriver {public static void main (String [] args) throws Exception {/ / 1, get job object Configuration conf = new Configuration (); Job job = Job.getInstance (conf) / / 2, associated jar, the specific driver class job.setJarByClass (WordCountDriver.class) used when configuring the executor; / / 3, associated mapper and reducer job.setMapperClass (WordCountMapper.class); job.setReducerClass (WordCountReduce.class); / / 4. Set the key and value type job.setMapOutputKeyClass (Text.class) of the output of mapper; job.setMapOutputValueClass (IntWritable.class) / / 5. Set the key and value types of the final output of the program. If there is reducer//, write the kv type of reducer output. If there is no reducer, write the kv type of mapper output. Job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); / / set custom Combiner class job.setCombinerClass (WordCountCombiner.class); / / job.setCombinerClass (WordCountReduce.class); you can also use job.setInputFormatClass (CombineTextInputFormat.class); CombineTextInputFormat.setMaxInputSplitSize (job,4194304) / / 6. Set the input and output path of the file FileInputFormat.setInputPaths (job,new Path ("D:\ io\\ hadooptest\\ combineinput"); / / require that the path cannot exist and send it to the mr program to create FileOutputFormat.setOutputPath ("D:\ io\\ hadooptest\\ Combineroutput2"); / / 7, submit job job.waitForCompletion (true) 2. OutPutFormat data output 2.1. OutputFormat introduction ①: Outputformat is an interface with two abstract methods defined internally-- RecordWriter getRecordWriter (FileSystem ignored,JobConf job,String name,Progressable progress): this method is used to obtain RecordWriter objects, and the master is responsible for the write-out operation of the data.-- void checkOutputSpecs (FileSystem ignored,JobConf job): this method is used to detect the output path when the output path in the driver exists. The implementation class of the method will throw an exception ("Output directory" + outDir + "already exists") ②: view the following figure of the implementation class of the current interface through ctrl+h-- TextOutputFormat (the default write-out method used by hadoop), write by line, and internally rewrite the getRecordWriter () method-- SequenceFileOutputFormat (the final file is written in binary format)-- MultipleOutputFormat (sub-abstract class, under which there are specific implementation methods).

! [OutputFormat implementation Class] (https://oscimg.oschina.net/oscnet/up- 777fe19a5bf6864396beac3aa83d8350e9e.png "OutputFormat implementation Class")

2.2.Custom output class / / 1, LogMapperpublic class LogMapper extends Mapper {@ Override protected void map (LongWritable key, Text value, Context context) throws Exception {context.write (value,NullWritable.get ());}} / / 2, LogReducerpublic class LogReducer extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws Exception {for (NullWritable value: values) {context.write (key,NullWritable.get ()) } / / 3. MyOutPutFormatpublic class MyOutPutFormat extends FileOutputFormat {/ * overrides the getRecordWriter () method and internally defines a write-out class * @ param job * @ return * @ throws IOException * @ throws InterruptedException * / public RecordWriter getRecordWriter (TaskAttemptContext job) throws Exception {LogRecordWriter rw = new LogRecordWriter (job.getConfiguration ()); return rw }} / / 4. LogRecordWriter/** * Custom LogRecordWriter objects need to inherit RecordWriter class * * requirements: * write log data containing "luck" to D:/bigtools/luck.log * write log data that does not contain "luck" to D:/bigtools/other.log * / public class LogRecordWriter extends RecordWriter {/ / File output path private String luckPath = "D:/bigtools/luck.log" Private String otherPath = "D:/bigtools/other.log"; private FSDataOutputStream atguiguOut; private FSDataOutputStream otherOut; private FileSystem fs; / * initialize * @ param conf * / public LogRecordWriter (Configuration conf) {try {fs = FileSystem.get (conf); luckOut = fs.create (new Path (luckPath)); otherOut = fs.create (new Path (otherPath)) } catch (IOException e) {e.printStackTrace ()}} / * override the write method * @ param key * @ param value * @ throws IOException * @ throws InterruptedException * / public void write (Object key, Object value) throws Exception {String log = key.toString () If (log.contains ("luck")) {luckOut.writeBytes (log + "\ n");} else {otherOut.writeBytes (log + "\ n");}} / * close the stream * @ param context * @ throws IOException * @ throws InterruptedException * / public void close (TaskAttemptContext context) throws IOException, InterruptedException {IOUtils.closeStream (luckOut) IOUtils.closeStream (otherOut);}} / / 5, LogDriverpublic class LogDriver {public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); Job job = Job.getInstance (conf); job.setJarByClass (LogDriver.class); job.setMapperClass (LogMapper.class); job.setReducerClass (LogReducer.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (NullWritable.class) Job.setOutputKeyClass (Text.class); job.setOutputValueClass (NullWritable.class); / / set custom output job.setOutputFormatClass (MyOutPutFormat.class); FileInputFormat.setInputPaths ("D:\ io\\ hadooptest\\ loginput"); FileOutputFormat.setOutputPath (job,new Path ("D:\ io\\ hadooptest\\ logoutput"); job.waitForCompletion (true) This is the end of the content of "how to use Combiner". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.