Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How hadoop merges sequcefie and reads it in map

2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how hadoop merges sequcefie and reads it in map". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Package hgs.sequencefile;import java.io.IOException;import java.net.URI;import java.net.URISyntaxException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Text / / merge small files public class SequenceMain {public static void main (String [] args) throws IOException, URISyntaxException {Configuration conf = new Configuration (); FileSystem fs = FileSystem.get (new URI ("hdfs://192.168.6.129:9000"), conf) / / get all the files under this folder FileStatus [] fstats = fs.listStatus (new Path ("/ words")); / / System.out.println (fstats.length); Text key = new Text (); Text value = new Text () @ SuppressWarnings ("deprecation") / / create a sequecewriter / / merge.seq is the file name SequenceFile.Writer writer = SequenceFile.createWriter (fs, conf, new Path ("/ sequence/merge.seq"), key.getClass (), value.getClass ()) / / Loop through each file for (FileStatus fis: fstats) {/ / write each file to sequencefile in the form of key value FSDataInputStream finput = fs.open (fis.getPath ()); byte [] buffer = new byte [(int) fis.getLen ()] IOUtils.readFully (finput, buffer, 0, buffer.length); / / File name is key. The content of the file is value key.set (fis.getPath (). GetName ()); value.set (buffer); writer.append (key, value) Finput.close ();} writer.close (); fs.close ();}} package hgs.sequencefile;import java.io.IOException;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper Public class SequnceMapper extends Mapper {@ Override protected void map (Text key, Text value, Mapper.Context context) throws IOException, InterruptedException {context.write (key, value);}} package hgs.sequencefile;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.SequenceFileOutputFormat;import org.apache.hadoop.mapreduce.Job Import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.SequenceFileAsTextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class SequenceDriver {public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); Job job = Job.getInstance (conf, "read_sequence_file") Job.setJarByClass (hgs.sequencefile.SequenceDriver.class); / / TODO: specify a mapper job.setMapperClass (SequnceMapper.class); / / TODO: specify a reducer / / job.setReducerClass (Reducer.class); / / TODO: specify output types job.setOutputKeyClass (Text.class) Job.setOutputValueClass (Text.class); / / read the inputformat of sequencefile in this setting, which reads the key value / / SequenceFileAsBinaryInputFormat of String's tears habit and the key value job.setInputFormatClass (SequenceFileAsTextInputFormat.class) of ByteWritable alone. / / TODO: specify input and output DIRECTORIES (not files) FileInputFormat.setInputPaths (job, new Path ("hdfs://192.168.6.129:9000/sequence")); FileOutputFormat.setOutputPath (job, new Path ("hdfs://192.168.6.129:9000/seqresult")); if (! job.waitForCompletion (true)) return This is the end of the introduction of "how hadoop merges sequcefie and reads it in map". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report