Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Spark Optimization (1): rational allocation of Resources

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Allocate more resources: the king of performance tuning is to increase and allocate more resources, and the improvement in performance and speed is obvious; basically, within a certain range, the increase in resources is proportional to the improvement in performance; after writing a complex spark job, the first step in performance tuning is, I think, to adjust the optimal resource allocation. On this basis, if your spark assignment can allocate resources to the top of your ability range, no more resources can be allocated, and the company's resources are limited; then it is the point to consider doing these later performance tuning.

Question:

1. What resources are allocated?

2. Where to allocate these resources?

3. Why is the performance improved after more resources are allocated?

Answer:

1. What resources are allocated? Executor 、 cpu per executor 、 memory per executor 、 driver memory

2. Where to allocate these resources? When we submit a spark job in a production environment, we use the spark-submit shell script to adjust the corresponding parameters.

/ usr/local/spark/bin/spark-submit\

-- class cn.spark.sparktest.core.WordCountCluster\

-- num-executors 3\ number of configured executor

-- driver-memory 100m\ configure the memory of driver (little impact)

-- executor-memory 100m\ configure the memory size of each executor

-- executor-cores 3\ configure the number of cpu core per executor

/ usr/local/SparkTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar\

3. How big is the adjustment, which can be regarded as the largest?

First, Spark Standalone, a set of Spark cluster has been set up on the company cluster. You should know in your heart how much memory and how much cpu core; each machine can still give you. Then, when setting up, adjust the resource allocation of each spark job according to this actual situation. For example, each machine can give you 8 gigabytes of memory and 4 cpu core;10 machines; the number of executor can be set to 20; an average of 4 gigabytes of memory and 2 cpu are allocated per executor.

The second is Yarn. Resource queue. Resource scheduling. You should check, how many resources are there in your spark job, the resource queue you want to submit to? 500 GB of memory, 100 cpu core;executor, 50; average allocation of 10 GB of memory per executor, 2 cpu.

As a rule, try to adjust the amount of resources you can use to the maximum size (the number of executor ranges from dozens to hundreds; executor memory; executor cpu core)

4. Why can the performance be improved after adjusting the resources?

4.1.Add executor:

If the number of executor is relatively small, then the number of task that can be executed in parallel is relatively small, which means that the ability of parallel execution of our Application is very weak.

For example, if there are 3 executor and each executor has 2 cpu core, then the task that can be executed in parallel at the same time is 6. After the implementation of 6, the next batch of 6 task will be replaced.

Increasing the number of executor means that there are more task that can be executed in parallel. For example, it used to be 6, but now it may be possible to execute 10, or even 20, 100 in parallel. Then the parallel ability is several times, tens of times higher than before.

Accordingly, performance (the speed of execution) can also be improved several times to dozens of times.

4.2.Adding the cpu core of each executor also increases the parallelism of execution. There were originally 20 executor, each with only 2 cpu core. The number of task that can be executed in parallel is 40 task.

Now the number of cpu core per executor has increased to five. The number of task that can be executed in parallel is 100 task.

The speed of execution has been increased by 2.5 times.

Increase the amount of memory per executor. After increasing the amount of memory, there are two points to improve performance:

If you need to cache RDD, more memory will allow you to cache more data, write less data to disk, and even not write to disk. Reduced disk IO.

For shuffle operations, the reduce side will need memory to store and aggregate the pulled data. If there is not enough memory, it will also be written to disk. If you allocate more memory to executor, there will be less data that needs to be written to disk, or even to disk. Reduced disk IO and improved performance.

For task execution, many objects may be created. If the memory is small, it may frequently cause the JVM heap memory to become full, followed by frequent GC, garbage collection, minor GC, and full GC. The speed is very slow As the memory increases, it brings less GC and garbage collection, which avoids slowing down and getting faster.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 219

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report