In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is to share with you about what hadoop-streaming is. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Hadoop Streaming is a programming tool provided by Hadoop that allows users to use any executable or script file as Mapper and Reducer, such as:
Use some commands in shell scripting language as mapper and reducer (cat as mapper,wc as reducer)
Bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar-inputinput-output output-mapper / bin/cat-reducer / usr/bin/wc
Mapper and reducer read user data from standard input, process it line by line, and send it to standard output. The Streaming tool creates a MapReduce job, sends it to each tasktracker, and monitors the execution of the entire job.
If a file (executable or script) is initialized as a mapper,mapper, each mapper task starts the file as a separate process, and when the mapper task runs, it splits the input into lines and provides each line to the standard input of the executable process. At the same time, mapper collects the standard output of the executable process and converts each line of content received into a key/value pair as the output of mapper. By default, the part before the first tab in a line is key, and the part that follows (excluding tab) is value. If there is no tab, the whole row is used as the key value and the value value is null.
For reducer, similar.
The above is the basic communication protocol between Map/Reduce framework and streaming mapper/reducer.
Hadoop Streaming usage
Usage: $HADOOP_HOME/bin/hadoop jar\
$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar [options]
Options:
(1)-input: enter the file path
(2)-output: output file path
(3)-mapper: a mapper program written by a user, which can be an executable file or a script
(4)-reducer: a reducer program written by a user, which can be an executable file or a script
(5)-file: package files into submitted jobs, which can be input files for mapper or reducer, such as configuration files, dictionaries, etc.
(6)-partitioner: user-defined partitioner program
(7)-combiner: user-defined combiner program (must be implemented in java)
(8)-D: some attributes of the job (previously-jonconf), specifically:
1) number of mapred.map.tasks:map task
2) number of mapred.reduce.tasks:reduce task
3) stream.map.input.field.separator/stream.map.output.field.separator: number of map task inputs / outputs
According to the delimiter, the default is\ t.
4) stream.num.map.output.key.fields: specifies the number of fields occupied by key in the map task output record
5) the delimiter of stream.reduce.input.field.separator/stream.reduce.output.field.separator:reduce task input / output data. The default is\ t.
6) stream.num.reduce.output.key.fields: specifies the number of fields occupied by key in the reduce task output record
In addition, Hadoop itself comes with some useful Mapper and Reducer.
Thank you for reading! This is the end of this article on "what is hadoop-streaming?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.