Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use JVM reuse function uber in Yarn

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the Yarn JVM reuse function uber how to use, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

Understand Container: the container encapsulates machine resources, such as memory, CPU, disk, network, etc., each task (Task) will be assigned a container, the task can only be executed in that container, and use the resources encapsulated by the container.

The Container required by an application is divided into two main categories, as follows:

(1) Container that runs ApplicationMaster: this is requested and started by ResourceManager (from the internal resource scheduler). When users submit the application, they can specify the resources required by the unique ApplicationMaster.

(2) Container that runs all kinds of tasks: this is requested by ApplicationMaster to ResourceManager and started by communication between ApplicationMaster and NodeManager.

The above two types of Container may be on any node, and their locations are usually random, that is, the ApplicationMaster may run on the same node as the tasks it manages.

Jvm reuse:

First, briefly review the JVM reuse feature in Hadoop 1.x: users can change the configuration to specify the maximum number of Task that TaskTracker can accumulate within the same JVM (the default is 1). This advantage is to reduce the number of JVM startup and exit, so as to achieve the purpose of improving the efficiency of task execution. The method of configuration is also simple: by setting the value of the parameter mapred.job.reuse.jvm.num.tasks in mapred-site.xml. The default value is 1, which means that TaskTracker starts a JVM for each Map task or Reduce task, and exits the JVM when the task is finished. And so on, if this value is set to 3MagneTaskTracker, then a maximum of 3 Task will be executed in the same JVM before exiting the JVM.

In Yarn (Hadoop MapReduce v2), there is no parameter mapred.job.reuse.jvm.num.tasks, but it also has a function similar to JVM Reuse-uber. According to Arun, enabling this feature can make some tasks two to three times more efficient ("we've observed 2x-3x speedup for some jobs"). However, because the structure of Yarn is very different from that of JobTracker/TaskTracker in MapReduce v1, the principle and configuration of uber are quite different from the previous JVM reuse mechanism.

1) the principle of uber:

The default configuration of Yarn disables the uber component, that is, does not allow JVM reuse. Let's first look at how Yarn executes a MapReduce job in this case. First, the Application Manager in Resource Manager applies for a container in the NodeManager for each application (such as a user-submitted MapReduce Job), and then launches an Application Master in that container. Container is a container for allocating resources (memory, cpu, hard disk, etc.) in Yarn, and when it starts, it starts a JVM accordingly. At this point, Application Master successively applies for a container from Resource Manager for each task (a Map task or Reduce task) contained in application. After each container is obtained, the NodeManager to which the container belongs is asked to start the container, and then the corresponding task is executed in the container. After the task is executed, the container will be withdrawn by the NodeManager, and the JVM owned by the container will be exited accordingly. In this case, you can see that each JVM executes only one Task, and the JVM is not reused.

Users can enable uber components to allow JVM reuse-that is, to execute multiple task in turn within the same container. In the yarn-site.xml file, change the configuration of several parameters to enable the method of uber:

Parameter | default value | description

-mapreduce.job.ubertask.enable | (false) | whether to enable the uber feature. If this feature is enabled, all the child task of a "small application" will be executed in the same JVM, achieving the purpose of JVM reuse. This JVM is the JVM (running in its container) used by the ApplicationMaster responsible for the application. So what kind of application is a "small application"? The following parameters are used to define a "small application"

-mapreduce.job.ubertask.maxmaps | 9 | threshold for the number of map tasks. If the number of map contained in an application is less than the definition of this value, the application will be considered as a small application.

-mapreduce.job.ubertask.maxreduces | 1 | threshold for the number of reduce tasks. If the number of reduce contained in an application is less than the definition of this value, then the application is considered to be a small application. However, Yarn does not support "CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE" when the value is greater than 1.

-mapreduce.job.ubertask.maxbytes | | threshold of the input size of the application. Default is the value of dfs.block.size. When the actual input size exceeds the setting of this value, the application is considered to be a small application.

Finally, let's take a look at how Yarn executes an application when the uber function is enabled. First, the Application Manager in Resource Manager applies for a container in NodeManager for each application, and then launches an Application Master in that container. When containe starts, it starts a JVM accordingly. At this point, if the uber function is enabled and the application is considered to be a "small application", then Application Master will execute each task contained in the application sequentially in the JVM in the container until all task is executed ("WIth 'uber' mode enabled, you'll run everything within the container of the AM itself"). In this way, Application Master no longer has to apply to Resource Manager for a separate container for each task, and finally achieves the goal of JVM reuse (resource reuse).

Thank you for reading this article carefully. I hope the article "how to use the JVM reuse function uber in Yarn" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report