How to implement Spark Executor memory Management 11/01 Update SLTechnology News&Howtos

How to implement Spark Executor memory Management

2025-11-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to achieve Spark Executor memory management, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.

Preface

The memory management of Spark is an important role in the memory distributed engine. Only by understanding the mechanism and principle of memory management can we optimize it better.

Content 1. Static memory management (policy prior to Spark 1.6.x)

Static memory management diagram-in-heap

Unroll source code reference: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L249

Static memory management diagram-- out of heap

2. Unified memory management (Spark strategy after 1.6.x)

Unified memory management diagram-in-heap

Unified memory management diagram-out-of-heap

The most important optimization is the dynamic occupancy mechanism. The rules are as follows: 1\ set the basic storage memory and execution memory area (spark.storage.storageFraction parameter), which determines the range of space owned by both parties. When there is insufficient space on both sides, it is stored on the hard disk. If your side is short of space and the other side is free, you can borrow the other side's space. (insufficient storage space means that there is not enough storage space to put down a complete Block) 2. After the space of the execution memory is occupied by the other party, you can allow the other party to transfer the occupied part to the hard disk, and then "return" the borrowed space. 3. After the space of the storage memory is occupied by the other party, it is impossible for the other party to "return", because many factors in the Shuffle process need to be considered, so it is more complicated to implement.

Dynamic occupancy mechanism diagram

With the unified memory management mechanism, Spark improves the utilization of in-heap and out-of-heap memory resources to a certain extent, reducing the difficulty for developers to maintain Spark memory, but it does not mean that developers can rest easy. For example, if there is too much memory space or too much cached data, it will lead to frequent full garbage collection and reduce the performance of the task execution, because the cached RDD data usually resides in memory for a long time. Therefore, in order to give full play to the performance of Spark, developers need to further understand the respective management methods and implementation principles of storage memory and execution memory.

Unified memory management can also be simplified

(1) Reserved Memory

It is not recommended to change this parameter (default)

The memory reserved by the system, starting with Spark 1.6.0, has a value of 300MB, and its size cannot be changed in any way without Spark recompiling or setting spark.testing.reservedMemory, because it is not a test parameter for production.

If memory is not reserved for at least 1.5 for the Spark executor = 450MB heap,spark-submit will fail with a * * "please use larger heap size" * * error message.

(2) User Memory

The remaining memory pool after allocating Spark Memory

Users can store data structures and Spark metadata that will be used in RDD transformations there, so they must pay attention to the definition and use of data structures in coding.

The size of the memory pool can be calculated as ("Java heap"-"reserved memory") (1.0-spark.memory.fraction), which by default equals ("Java heap"-300MB) 0.25

(3) Spark Memory

Apache Spark managed memory pool

Calculate: ("Java heap"-"reserve memory") * spark.memory.fraction, and use the Spark 1.6.0 default value it gives us ("Java heap"-300MB) * 0.75.

The boundary of Execution Memory and Storage Memory is set by spark.memory.storageFraction, which defaults to 0.5.

Storage Memory: this memory is used to cache data that will be used later: such as broadcast variables, persist

Execution Memory: this memory is used to store objects needed during the execution of Spark shuffles, joins, sorts, and aggregations.

The above content is how to implement Spark Executor memory management. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.