Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the core tuning parameters of Spark

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

What are the core tuning parameters of Spark, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

What are the core tuning parameters of Spark?

Num-executors:

This parameter is used to set the total number of Executor processes to execute the Spark job. When Driver applies for resources from the YARN cluster manager, the YARN cluster manager will start the appropriate number of Executor processes on each worker node of the cluster as much as possible according to your settings. This parameter is very important, if not set, the default will only give you to start a small number of Executor processes, when your Spark job is very slow. (about 50,100 Executor processes are recommended)

Executor-memory:

This parameter is used to set memory for each Executor process. The size of Executor memory often directly determines the performance of Spark jobs, and it is also directly related to common JVMOOM exceptions. (depending on the size of the job, it is recommended that you multiply 4Gram 8G _ nummure executors by executor-memory, which cannot exceed the maximum amount of memory in the queue.)

Executor-cores:

This parameter is used to set the number of CPUcore per Executor process. This parameter determines the ability of each Executor process to execute task threads in parallel. Because each CPUcore can only execute one task thread at a time, the more CPUcore you have per Executor process, the faster you can execute all the task threads assigned to you. (it is recommended that the number be set to 2: 4, and the num-executors*executor-cores should not exceed the total CPUcore of the queue: 1, 3, 1, 2)

Driver-memory:

This parameter is used to set the memory of the Driver process (512m to 1G is recommended).

Spark.default.parallelism:

This parameter is used to set the default number of task per stage. This parameter is extremely important, if not set may directly affect the performance of your Spark job. (it is recommended to be about 50,500. By default, Spark sets the number of block according to the number of block of the underlying HDFS. By default, one HDFSblock corresponds to one task. The official website of Spark suggests that it is appropriate to set this parameter to 2 / 3 times of num-executors*executor-cores)

Spark.storage.memoryFraction:

This parameter is used to set the percentage of RDD persistent data in Executor memory, which defaults to 0.6 (in principle, all data can be kept in memory as far as possible, but if you find frequent GC of jobs, you should consider whether to reduce it.)

Spark.shuffle.memoryFraction:

This parameter is used to set the percentage of Executor memory that can be used for aggregation operations after a task pulls the task output of the previous stage in the shuffle process. The default is 0.2. In other words, Executor defaults to only 20% of the memory used for this operation. If the memory used by the shuffle operation exceeds this 20% limit when aggregating, the excess data will be overwritten to the disk file, which will greatly degrade performance. (when there are many shuffle operations, it is recommended to reduce the memory share of persistent operations and increase the memory ratio of shuffle operations, so as to avoid running out of memory when there is too much data in the shuffle process, which must be overwritten to disk, thus reducing performance.)

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report