How to adjust the out-of-stack memory of executor by spark 04/21 Update SLTechnology News&Howtos

How to adjust the out-of-stack memory of executor by spark

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how spark adjusts executor out-of-stack memory". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how spark adjusts executor out-of-stack memory".

When do I need to adjust the out-of-heap memory size of Executor?

When an exception occurs:

Shuffle file cannot find,executor lost 、 task lost,out of memory

There are roughly two situations in which this problem occurs:

The Executor is dead, so is the block manager on the corresponding Executor. The corresponding shuffle map output file cannot be found, and the Reducer cannot pull the data.

Executor didn't hang up, but there was a problem pulling the data.

In the above cases, you can consider adjusting the out-of-heap memory of executor. It may be possible to avoid error reporting; in addition, sometimes, when the out-of-heap memory adjustment is relatively large, it will bring some improvement to the performance. This executor is running, suddenly out of memory, out of heap memory, may OOM, hang up. Block manager is gone, and data is lost.

If at this point, stage0's executor is dead and BlockManager is gone; at this time, stage1's executor's task, though through

The MapOutputTrakcer of Driver got the address of its own data; but actually went to the other party's BlockManager to get the data.

At the same time, you can't get it.

At this point, the job (jar) will be run on spark-submit, the client (standalone client, yarn client) will be run, and log:shuffle output file not found will be printed on the local machine. DAGScheduler,resubmitting task, hang up all the time. Hang up several times and report a few mistakes over and over again, the whole spark homework will collapse.

In the conf spark.yarn.executor.memoryOverhead=2048spark-submit script, use the-- conf method to add the configuration; be sure to pay attention to it! Remember, not in your spark job code, use new SparkConf (). Set () this way to set, do not set it this way, it is useless! Be sure to set it in the spark-submit script.

Spark.yarn.executor.memoryOverhead (see the name, as the name implies, is for yarn-based commit mode) by default, this out-of-heap memory limit defaults to 10% of the memory size of each executor; later, when we usually deal with big data in our project, there will be problems here, causing the spark job to crash repeatedly and fail to run. At this point, this parameter will be adjusted, at least 1G (1024m), or even 2G, 4G. Usually, after this parameter is adjusted, some abnormal problems of JVM OOM will be avoided, and at the same time, the performance of the overall spark operation will be greatly improved.

Adjust the waiting time!

Executor, giving priority to obtaining a certain piece of data from your own locally associated BlockManager

If the local BlockManager is not available, then TransferService will be used to remotely connect the executor on other nodes.

BlockManager to get, try to establish a remote network connection, and to pull data, task created objects are particularly large, often let the JVM heap memory overflow, garbage collection. I happened to run into that exeuctor JVM in garbage collection.

JVM tuning: garbage collection

In the process of garbage collection, all worker threads stop; this means that as long as spark / executor stops working and cannot provide a response once garbage collection is carried out, there will be no response and the network connection cannot be established and will get stuck. The default timeout for ok,spark network connection is 60s. If the network connection is stuck for 60s, it will be declared a failure. Encounter a situation, occasionally! There are no rules! So-and-so file A string of file id. Uuid (dsfsfd-2342vs--sdf--sdfsd). Not found . File lost . In this case, it is very likely that the executor with that data is in jvm gc. So when pulling data, a connection cannot be established. Then, after exceeding the default of 60s, the failure is declared directly. If you report an error several times and fail to pull the data several times, it may cause the spark job to crash. It may also lead to DAGScheduler, which repeatedly submits the stage several times. TaskScheduler, submit the task several times repeatedly. Greatly extend the running time of our spark jobs.

You can consider adjusting the timeout of the connection. -- conf spark.core.connection.ack.wait.timeout=300

The spark-submit script, remember, is not set up in new SparkConf (). Set (). Spark.core.connection.ack.wait.timeout (spark core,connection, connection, ack,wait timeout, timeout waiting time when the connection is not established) after adjusting this value, generally speaking, it can avoid some occasional file pull failures, and the file lost is dropped.

Why are these two parameters mentioned here?

Because it is more practical, when really dealing with big data (not tens of millions of data, millions of data), hundreds of millions, billions, tens of billions.

It is easy to encounter problems with executor out-of-heap memory and connection timeouts caused by gc.

File not found,executor lost,task lost .

It is helpful to adjust the above two parameters.

Thank you for your reading, the above is the content of "how spark adjusts executor out-of-stack memory". After the study of this article, I believe you have a deeper understanding of how spark adjusts executor out-of-stack memory, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.