In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to realize the new API of mapreduce multi-file output". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. For MultipleOutputs.addNamedOutput (job, "errorlog") in the code
TextOutputFormat.class, Text.class, NullWritable.class); method. In fact, the second parameter is not used in this way. Look at the code below:
Private MultipleOutputs multipleOutputs = null; @ Override protected void reduce (IntWritable key, Iterable values,Context context) throws IOException, InterruptedException {for (Text val:values) {multipleOutputs.write ("KeySplit", NullWritable.get (), val, key.toString () + "/"); multipleOutputs.write ("AllData", NullWritable.get (), val) }}
The write function has a lot of overloaded methods, which used to have three parameters, but this method actually outputs all the reduce output to one folder.
At this point, the second argument we pass when we call the MultipleOutputs.addNamedOutput () function is multiple, so it will result in
-rw-r--r-- 2 hadoop supergroup 10569073 2014-06-06 11:50 / test/aa/fileRequest-m-00063.lzo
-rw-r--r-- 2 hadoop supergroup 10512656 2014-06-06 11:50 / test/aa/fileRequest-m-00064.lzo
-rw-r--r-- 2 hadoop supergroup 68780 2014-06-06 11:51 / test/aa/firstIntoTime-m-00000.lzo
-rw-r--r-- 2 hadoop supergroup 67901 2014-06-06 11:51 / test/aa/firstIntoTime-m-00001.lzo
Such a phenomenon, and will output a lot of useless empty files
So in fact, the write method has a method with four parameters, and the last parameter happens to pass a directory to output the data generated by reduce to different folder directories for different logic. Such as multipleOutputs.write ("KeySplit", NullWritable.get (), val, key.toString () + "/") in the first paragraph of code Statement, the function of the last parameter is to use key as a folder and output data with the same key to this folder, followed by a "/" represents the current directory, which certainly does not refer to the current directory of the project, but the parameters of the output directory passed when executing hadoop jar, such as: hadoop jar test.jar com.TestJob / input / output
Suppose the data looks like this:
1 Limei
1 Xiaohui
2 Xiao Hong
3 Dahua
Then the output of the three folders is
/ output/1
/ output/2
/ output/3
Where / output/1 is a file in this folder, the content is
1 Limei
1 Xiaohui
There are other methods for the write function, which have not been studied yet, and the first function of the write method has not been studied. If you have time, you will summarize the multi-file output in detail.
Note: when configuring job
This code
MultipleOutputs.addNamedOutput (job, "errorlog", TextOutputFormat.class, Text.class, NullWritable.class); "mapreduce multi-file output how to implement the new API" is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 276
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.