Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to allocate resources dynamically by Spark

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "how to allocate resources dynamically in Spark". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Operation scene

For Spark applications, resources are an important factor affecting the execution efficiency of Spark applications. When a long-running service is assigned more than one Executor, but no tasks are assigned to it, while other applications are short of resources, it results in a great waste of resources and unreasonable scheduling.

Dynamic resource scheduling is to solve this scenario, according to the current load of application tasks, real-time increase or decrease the number of Executor, so as to achieve dynamic allocation of resources, so as to make the whole Spark system more healthy.

Second, dynamic resource strategy

1. Resource allocation strategy

When the dynamic allocation policy is enabled, application will dynamically request resources when the task is suspended due to insufficient resources, which means that the existing executor of the application cannot satisfy all task running in parallel. Spark applies for resources in a round. When task is suspended or waits for spark.dynamicAllocation.schedulerBacklogTimeout (default 1s) `, dynamic resource allocation will begin. After that, it will apply every spark.dynamicAllocation.sustainedSchedulerBacklogTimeout (default 1s) until sufficient resources are applied. The amount of resources applied for each time increases exponentially, that is, 1, 2, 4, 8, and so on.

Exponential growth is adopted because of two considerations: first, the initial number of applications is based on the possibility that application will be met immediately; second, it should be multiplied to prevent application from requiring a lot of resources, which can be met after a small number of applications.

2. Resource recovery strategy

When the executor idle time of application exceeds spark.dynamicAllocation.executorIdleTimeout (the default is 60s), it will be recycled.

Operation step 1. Configuration of yarn

First, you need to configure YARN to support Spark's Shuffle Service.

Modify the yarn-site.xml on each cluster:

-modify yarn.nodemanager.aux-servicesmapreduce_shuffle,spark_shuffle-add yarn.nodemanager.aux-services.spark_shuffle.classorg.apache.spark.network.yarn.YarnShuffleServicespark.shuffle.service.port7337

Copy $SPARKHOME/lib/spark-X.X.X-yarn-shuffle.jar to ${HADOOPHOME} / share/hadoop/yarn/lib/ of each NodeManager and restart all nodes that modify the configuration.

2. Configuration of Spark

Configure $SPARK_HOME/conf/spark-defaults.conf by adding the following parameters:

Spark.shuffle.service.enabled true / / enable External shuffle Service service spark.shuffle.service.port 7337 / / Shuffle Service default service port, must be consistent with the spark.dynamicAllocation.enabled true in yarn-site / / enable dynamic resource allocation spark.dynamicAllocation.minExecutors 1 / / the minimum number of executor allocated per Application spark.dynamicAllocation.maxExecutors 30 / / the maximum number of executor concurrently allocated per Application spark.dynamicAllocation.schedulerBacklogTimeout 1s spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s 4, start

Use spark-sql On Yarn to execute SQL and dynamically allocate resources. Start ThriftServer in yarn-client mode:

Cd $SPARK_HOME/sbin/./start-thriftserver.sh\-master yarn-client\-conf spark.driver.memory=10G\-conf spark.shuffle.service.enabled=true\-conf spark.dynamicAllocation.enabled=true\-conf spark.dynamicAllocation.minExecutors=1\-conf spark.dynamicAllocation.maxExecutors=300\-conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s

After startup, ThriftServer runs as a long service on Yarn.

This is the end of the content of "how Spark allocates resources dynamically". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report