In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
YARN is the MapReduce V2 version. It has many advantages over MapReduce V1:
1. Distracted jobTracker's mission. The resource manager is responsible for the resource management task, and the job startup, operation and monitoring tasks are the responsibility of the application topics distributed on the cluster nodes. This greatly reduces the problem of single point bottleneck and single point risk of jobTracker in MapReduce V1, and greatly improves the scalability and availability of the cluster.
two。 ApplicationMaster is a customizable part of MapReduce V2, so users can write their own application theme programs for the programming model. This greatly extends the scope of application of MapReduce V2.
3. Use Zookper to achieve failover in resource management. When resource management fails, the standby resource manager starts quickly based on the cluster status saved in ZooKeeper. MapReduce V2 supports applications to specify checkpoints. This ensures that the application theme can be quickly restarted according to the state on the hdfs after a failure. These two measures greatly improve the availability of MapReduce V2.
4. Cluster resources are uniformly organized into resource containers, unlike the difference between the map pool and the Reduce pool in MapReduce V1. In this way, whenever a task requests resources, the scheduler allocates the available resources in the cluster to the requested task, regardless of the resource type. This greatly improves the utilization of resources.
In fact, YARN has many advantages, so I won't enumerate them all here. Mainly talk about the work flow of YARN.
What are the specific components of YARN:
YARN consists of ResourceManager, NodeManager, JobHistoryServer, Containers, Application Master, job, Task and Client.
> Resource Manager: there is only one Cluster, which is responsible for resource scheduling, resource allocation and other tasks.
> JobHistory Server: responsible for querying job running progress and metadata management.
> nodemanager: runs on the datanode node and is responsible for starting Application and managing resources.
> Containers: Container is assigned through ResourceManager. Including cpu, memory and other resources of the container.
> Application master: generally speaking, Application master is equivalent to contractor and Resource Manager is equivalent to manager. Resource Manager first gives the task to Application master, and then Application master communicates Resource manager's instructions to each nodemanager (the equivalent of workers) to work. There is only one Applicationmaster per application, which runs on the node manager node, and the Applicationmaster is assigned by Resource manager.
> job: an input list of a mapper, a Reducer, or a process. Job can also be called Application.
> task: an independent unit of work that specifically does mapper or Reducer. Task runs in the Container of nodemanager.
> client: an Application program submitted to Resource manager.
Now that you know which units of work YARN consists of, let's talk about the overall flow of how a job is handled.
Users submit programs / jobs to YARN, including ApplicationMaster startup, ApplicationMaster commands, user programs, etc.; ResourceManager assigns the first Container to the job and communicates with the corresponding nodemanager, requiring it to start the Applicationmaster changing the job in this Container; Applicationmaster first registers with Resourcemaster so that the user can query the running status of the job directly through Resourcemanager, and then it will apply for resources for each person and monitor the running status of the task until the end of the run. Application requests and receives resources from Resourcemanager through RPC requests.
Applicationmaster then asks the specified nodemanager node to start the task.
After startup, go to the map tesk specified by Resource Manager.
When Map task is done, notify application master. Then application master went to tell resouce manager. Next, Resource manager allocates new resources to application master to find someone else to do something else.
Next, Application master informs nodemanager to start a new Container to get ready for work. The input to the job is the output from the end of the map task.
Start working on the Reduce Task mission.
When the Reduce task on each node is finished, synchronize the task results of the working nodemanager. To do the final reduce task.
After all the calculations are finished, the final result is output to hdfs. Mission accomplished.
Through the illustration, the workflow of the whole YARN can be understood more clearly.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.