In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is to share with you about the characteristics of MapReduce. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Characteristics of MapReduce
is easy to program (it's really easy to get familiar with, mostly in two parts, map and reduce. Hive and pig make mapreduce easier)
has good scalability (it can be achieved by simply adding machines)
High fault tolerance of (tasks in job partially failed and can be reexecuted)
is suitable for offline processing of massive data above PB level.
MapReduce is not good at it.
real-time computing
, like MySQL, returns results in milliseconds or seconds (you can refer to Spark or HBase,HBase for random read and write performance, but the statistics are not very good)
streaming computing
The input dataset of MapReduce is static and cannot be changed dynamically
The design characteristics of MapReduce determine that the data source must be static (Storm can be considered)
DAG calculation
multiple applications have dependencies, and the input of the latter application is the output of the previous one (Tez)
MapReduce divides the entire running process of the job into two stages.
Map phase and Reduce phase
The Map phase consists of a certain number of Map Task
input data format parsing: InputFormat
input data processing: Mapper
data packet: Partitioner
The Reduce phase consists of a certain number of Reduce Task
Remote copy of data
data is sorted by key
data processing: Reducer
data output format: OutputFormat
By default, TextInputFormat splits files and processes each Split, providing RecordReader to generate key/value
TextInputFormat:Key is the offset of the line in the file, and value is the line content. If the line is truncated, the first few characters of the next block are read.
Conceptual designed
Block
The smallest data storage unit in HDFS defaults to 64MB
Spit
The smallest cell in MapReduce corresponds to Block by default.
Block and Split
The correspondence between Split and Block is arbitrary and can be controlled by the user.
Map stage
InputFormat (default TextInputFormat)
Mapper
Partitioner
Sort (optional)
Combiner (local reducer) (optional)
Reduce stage
Sort
Reducer
OutputFormat (default TextOutputFormat)
Combiner
Combiner can do look at local reducer merging value corresponding to the same key (wordcount example) usually has the same benefits as Reducer logic
reduces the amount of data output from Map Task (disk IO)
reduces the amount of data transmitted over the Reduce-Map network (network IO)
results can be superimposed.
Sum (YES!), Average (NO!)
Partitioner
Partitioner determines which Reduce Task to process each piece of data output by Map Task: hash (key) mod R R is the number of Reduce Task
allows users to customize. In many cases, custom Partitioner is required.
such as "hash (hostname (URL)) mod R" ensures that web pages with the same domain name are handed over to the same Reduce Task for processing
Thank you for reading! This is the end of this article on "what are the characteristics of MapReduce?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.