Resource isolation of yarn 07/06 Update SLTechnology News&Howtos

Resource isolation of yarn

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

In the last week, I have been working on the resource isolation of yarn, and I have re-read the previous books about yarn. This time, I will write a summary of my work.

The reason for resource isolation is that many teams within the company are using yarn to submit a variety of tasks, such as hive's mapreduce,spark deployment on yarn, sqoop. in order to prevent a single task from using too many resources and causing other tasks in the entire cluster to fail to run, it is necessary to use yarn's resource isolation.

Although in yarn, there are Capacity Scheduler and Fair Scheduler to schedule resources between different tasks (queues to be exact), I feel that there is really not much difference between these two allocation strategies in resource isolation, because both can limit the maximum and minimum resources used by each queue; and existing resources can be divided into different queues, and each application is only submitted in a specific queue. Of course, for resource isolation between tasks within the queue, Fair Scheduler supports an additional FIFO policy. Because the hadoop used by the company uses Fair Scheduler by default, the Fair policy is used in the end. Let's talk about how you do it.

Step1: create different users for different development teams. For example, in the screenshot below, group1 and group2 are established for two group respectively:

Step2: modify permissions

The permissions here include hadoop permissions and hdfs permissions, which depends on your own actual situation.

Step3: modifying yarn-site.xml

The following parameters are mainly modified

Yarn.resourcemanager.scheduler.class: configure which Scheduler to use. Set it to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler because you are using FairScheduler.

Yarn.scheduler.fair.allocation.file: specify the path to the allocation policy file

Yarn.scheduler.fair.preemption: whether preemption of resources between queues is supported. Although it is set to false, busy queues will more or less preempt idle queue resources.

There are still many configuration parameters about yarn-site, which you can look for on the Internet.

Step4: write a specific configuration file for fair scheduler. When you test it yourself, it will look like the screenshot below.

First of all, a brief introduction: resources in yarn are allocated according to queues (or resource pools), which have their own child queues, while root queues are like Object queues in java (if not set, all applications submit applications to root.default queues), and are the parent queues of all queues. In the configuration file above, this defines the following rules for the group1 queue:

1.group1 and group2, the sub-queues of the two root queues (where queue naming is the same as user naming)

The 2.group1 queue retains at least 10 gigabytes of memory, 10 cpu cores, and up to 15 gigabytes of memory and 15 cpu cores

3. Only applications in the queue use fair scheduling policies to allocate resources, and up to 50 applications can be run in group1.

4. Only group1 and hadoop users can submit applications to the group1 queue

5. Only hadoop users can manage group1 queues (that is, only hadoop can kill the above applications).

Step5: restart yarn

Finally, through the spark-shell-- master yarn-- num-executors-- executor-memory-- queue test, it is found that resource isolation can be achieved, and the following results can be achieved:

1.group1 users can only submit applications to the group1 queue, and even if the resource is greater than the maximum, it will only be slightly more than a little bit, not more than many (interested friends can try it themselves)

2.group1 users cannot submit tasks to the group2 queue

3.hadoop users can still submit tasks to the group1, group2, and default queues, which ensures that the system has a role similar to that of root users.

However, it is found that the function of acl has not been realized, that is, whether it is hadoop, group1, group2 or any user who can execute yarn commands, any program on any queue can be killed through yarn application-kill. Many times reference books and the official website, but still did not succeed, in the end, can only be said to control the authority of the yarn command is 700, only allow hadoop users to execute this command, hoping to find a better solution in the future.

January 7, 2017

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.