In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what are the common fault-tolerant scenarios in Hadoop MapReduce". In daily operation, I believe many people have doubts about the common fault-tolerant scenarios in Hadoop MapReduce. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "what are the common fault-tolerant scenarios in Hadoop MapReduce?" Next, please follow the editor to study!
The first scenario: a task of the job is blocked and the resources are not released for a long time. How to deal with it?
This kind of scenario is usually caused by software Bug, data particularity and so on, which will block the program and stagnate the task. To the outside world, the Task seems to be blocked. This kind of thing often happens, because the task takes up resources for a long time but does not use them (if certain measures are not taken, it may never be used, resulting in "resource leakage"), which will lead to a decline in resource utilization, which is bad for the system. How does Hadoop MapReduce deal with this situation?
On TaskTracker, each task is regularly reported to TaskTracker on new progress (not if the progress remains the same), and further reported to JobTracker by TaskTracker. When a task is blocked, its progress will stagnate, and the task will not report its progress to TaskTracker, so that the upper limit of timeout must be reached, and TaskTracker will kill the task and report the task status (KILLED) to JobTracker, which triggers JobTracker to reschedule the task.
In practical application scenarios, there are some normal jobs whose tasks may not be read or output for a long time, such as reading the Map Task of the database or the Task that needs to connect to other external systems. For such applications, when writing Mapper or Reducer, an additional thread should be started to periodically report the heartbeat to TaskTracker through the Reporter component (just tell TaskTracker that he is alive, don't kill me).
The second scenario: after all the Map Task of the job has been run, during the Reduce Task operation, a node where the Map Task is located dies, or the storage disk of the Map result is damaged. What should I do?
This kind of scenario is complicated and needs to be discussed separately.
If the node dies, JobTracker knows through the heartbeat mechanism that TaskTracker is dead and reschedules the previously running Task and the running Map Task in the running job.
If the node is not hung up, but the disk on which the Map Task results are stored is damaged, there are two situations:
(1) all Reduce Task have completed the shuffle phase.
(2) some Reduce Task has not completed the shuffle phase, so you need to read the Map Task task.
In the case of *, if all Reduce Task runs smoothly, there is no need to deal with the Map Task that has already been run. If some Reduce Task fails after a period of time, it will be handled in the same way as the second one.
In the second case, when Reduce Task remotely reads the Map Task result that has been run (but the result has been corrupted), it will try to read it several times. If the number of attempts exceeds a certain upper limit, it will tell the TaskTracker that the Map Task result has been corrupted through RPC, and TaskTracker will further tell JobTracker,JobTracker through RPC that after receiving the message, it will reschedule the Map Task and recalculate the generated result.
It should be emphasized that in the current implementation of Hadoop MapReduce, the time interval between Reduce Task retries to read Map Task results increases exponentially. The calculation formula is 10000 * 1.3 ^ noFailedFetches, in which the range of noFailedFetches values is MAX {10, numMaps/30}, that is to say, if the number of map task is 300, it takes 10 attempts to find that the Map Task result has been damaged. It takes a long time to discover, and the more Map Task, the slower the discovery time, and this place usually needs to be tuned, because the more jobs with the number of tasks, the more likely it is to have this problem.
At this point, the study of "what are the common fault-tolerant scenarios in Hadoop MapReduce" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.