In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains the "Hadoop whole file reading method", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's train of thought slowly in depth, together to study and learn "Hadoop whole file reading method"!
When writing Hadoop programs, sometimes you need to read the whole file instead of multipart reading, but the default is multipart reading, so you have to write your own entire file read class.
What needs to be written are:
WholeInputFormat class, inherited from the FileInputFormat class
WholeRecordReader class, inherited from the RecordReader class
Where the class used for reading is the WholeRecordReader class. The following code takes Text as the key value type and BytesWritable as the value type, because most formats do not have corresponding type support in hadoop, such as jpg,sdf,png, etc., there is no corresponding class in hadoop, but they can be converted to byte [] byte stream, then converted to BytesWritable type, and finally converted to the corresponding type in Map or Reduce.
The code is as follows, as explained in:
Import java.io.IOException;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.BytesWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.InputSplit;import org.apache.hadoop.mapreduce.JobContext;import org.apache.hadoop.mapreduce.RecordReader;import org.apache.hadoop.mapreduce.TaskAttemptContext;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat Public class WholeInputFormat extends FileInputFormat {@ Override public RecordReader createRecordReader (InputSplit split, TaskAttemptContext context) throws IOException,InterruptedException {return new WholeRecordReader ();} @ Override / / determines whether it is sharded or not. False means no sharding, and true means sharding. / / in fact, it is OK not to write this, because you can read protected boolean isSplitable (JobContext context,Path file) {return false;}} all at once in WholeRecordReader.
Here is the WholeRecordReader class:
Import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.BytesWritable;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.InputSplit;import org.apache.hadoop.mapreduce.RecordReader;import org.apache.hadoop.mapreduce.TaskAttemptContext;import org.apache.hadoop.mapreduce.lib.input.FileSplit The class in public class WholeRecordReader extends RecordReader {/ / Hadoop that deals with files private FileSplit fileSplit; private FSDataInputStream in = null; private BytesWritable value = null; private Text key = null; / / is used to determine whether the file has been read / / because of this, the isSplitable method in WholeInputFormat does not have to write private boolean processed = false @ Override public void close () throws IOException {/ / do nothing} @ Override public Text getCurrentKey () throws IOException,InterruptedException {return this.key;} @ Override public BytesWritable getCurrentValue () throws IOException,InterruptedException {return this.value;} @ Override public float getProgress () throws IOException,InterruptedException {return processed? FileSplit.getLength (): 0;} @ Override public void initialize (InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {/ / Open a file input stream fileSplit = (FileSplit) split; Configuration job = context.getConfiguration (); Path file = fileSplit.getPath (); FileSystem temp = file.getFileSystem (job); in = temp.open (file) @ Override public boolean nextKeyValue () throws IOException, InterruptedException {if (key = = null) {key = new Text ();} if (value = = null) {value = new BytesWritable () } if (! processed) {/ / request a byte array to hold the contents to be read from the file byte [] content = new byte [(int) fileSplit.getLength ()]; Path file = fileSplit.getPath () / / with the name of the file as the key value passed to the Map function, you can set key.set (file.getName ()); try {/ / read the contents of the file IOUtils.readFully (in,content,0,content.length) / / set the value of value to the value value.set (new BytesWritable (content)) in byte [];} catch (IOException e) {e.printStackTrace ();} finally {/ / close the input stream IOUtils.closeStream (in) } / / set processed to true, which means reading the file is complete, and processed = true; return true;} return false;}} will not be read again.
After you have written this, set the input format of job to WholeInputFormat in the main () function or run () function, as follows:
Job.setInputFormatClass (WholeInputFormat.class)
Among them, the type of key,value can be changed to the type you need. However, it is recommended to use the BytesWritable type when the corresponding type cannot be found in Hadoop, and then use byte [] as the intermediate type to convert to a type that java can handle.
Thank you for your reading, the above is the content of "the whole file reading method of Hadoop". After the study of this article, I believe you have a deeper understanding of the whole file reading method of Hadoop, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.