The understanding of mapreduce 07/02 Update SLTechnology News&Howtos

The understanding of mapreduce

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. What problems should be considered when designing a parallel computing framework?

The first question is: is parallel computing sure to be multiple computers, and how do they divide tasks among multiple computers?

There is always a module to distribute tasks in this place, which means that it is the boss, it maintains tasks or resources.

Mapreduce is jobtracker,hadoop 2.x on hadoop 1.x, which is managed through yarn, which is ResourceManager, which manages other nodes and how tasks are distributed.

The younger brother is tasktracker on Hadoop version 1.x, and on hadoop2 version, NodeManager,NodeManager is starting a process YARNchild to run and process computing data.

The second question is: where does the computing data needed for parallel computing come from?

A task is very big, if you put it all on the boss's side, is it a lot of pressure? So they use ready-made hdfs that can store data to be responsible for storing data. After the client gets permission for the task from ResourceManager, then put all the required jar packages and dependencies on the hdfs node, so that they can all get the required tasks themselves. Can the boss just tell them a certain logo?

The third question is: how to summarize the results of parallel computing?

The data calculated by parallel computing is finally written to hdfs, but it is impossible to write to the boss. The boss may have to accept new tasks given by others continuously, and it is impossible to put them on each node, so the data is too discrete. Finally, he chose to continue to put it on hdfs and hdfs, which can be multiple files or one file as needed.

The fourth question is: how can some tasks fail in this process, and what will be done to make up for it?

They communicate through rpc (the so-called heartbeat mechanism, which gives feedback to the boss from time to time). The boss is asking other nodeManager to continue to do these things to make up for the calculation.

What is the running process of 2.mapreduce?

Client

Jobtracker

Inputsplit-> mapper ()

Mapoutput-shuffle---reducer ()-> output

Inputsplit-> mapper ()

Inputsplit: an inputsplit corresponds to this map function: only one line is processed as a mapper function.

Mapper output [hello 1] [zhang 1] [san 1]

Shuffle: group the desired results. For example, a group of hello, [hello, (1 dint 1pm 1)]

Reduceer: output hello 5

Zhangsan 1

Serialization instructions:

Serialization can write classes in memory to a file or database. For example, if you serialize a class and save it as a file, you can restore the original class to memory by deserializing the data in the file the next time you read it. Classes can also be serialized into stream data for transmission. That's what the objectinputstream class is for. There are many different formats of objects, files and data, so it is difficult to transfer and save them uniformly.

After serialization, there is a byte stream, no matter what it is, it can be changed into the same thing, it can be transferred or saved in a common format, and after the transfer is over, if you want to use it again, you will deserialize and restore it, so that the object is still the object, and the file is still the file.

In order to transfer data between networks in hadoop, serialization must be implemented (objects in memory are written to other nodes in the form of streams)

Hadoop uses its own efficient serialization mechanism to replace the Java version of the serialization mechanism (string,long and others all implement seriable)

The hadoop serialization mechanism must implement the writable interface.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.