In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Achieving goals and core concerns
The MapReduce interface may be implemented in many different ways, which needs to be selected according to the specific computing environment.
Take the common computing environment of Google as an example
1. Machine configuration Linux dual-process x86 2-4GB Memory
two。 Network hardware 100,000,000 MB/s, the average overall binary bandwidth is much less.
3. The cluster consists of hundreds or thousands of machines, and machine failures are very common.
4. Storage is provided by cheap IDE disks connected to each machine, and their internally developed distributed file systems are used to manage the data on those disks. File systems use replicas to provide availability and reliability on unreliable hardware
5. The user submits the Job to the scheduling system, and each Job consists of a set of Tasks, and the scheduler maps each Task to a set of available computers in the machine for execution.
MapReduce implementation process
The series of operations for a user to call the MapReduce function are as follows
1.Split Input FIles divides the input file into M split files according to the specified size S (16~64MB, specified by the user through optional parameters), and then starts multiple copy programs in the cluster
2.Assign Tasks Master picks up a free Worker node and assigns it a map task or reduce task
3.Parses K * V is assigned to the worker of map task to read the contents of related partitions, parse key / value pairs from the input and generate intermediate key / value pairs to be cached in memory.
The middle key / value pair of the 4.Written to Local Disk cache, which is divided into R regions by the partition function, writes periodically to the local disk and returns its relative position to Master,Master, and then passes the location information to Worker.
5.RPC Read and Sort when Master notifies Reduce Worker of location information, Reduce Worker uses remote procedure calls to read cached data from Map Worker's disk. When Reduce Worker reads in all the intermediate data sorted by key, if the intermediate data is too large to operate in memory, consider using external sorting.
6.Iterates and Appended Reduce Worker iterates over the sorted intermediate data, passing the corresponding set of intermediate values to the user's Reduce function for each unique intermediate key. Append the output of the Reduce function to the final output file of this reduce partition
Master wakes up user programs after 7.Completed completes all map and reduce tasks
-Operation and maintenance development (clouddevops) in edit by cloud era
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.