In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you how to achieve Spark Executor memory management, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.
Preface
The memory management of Spark is an important role in the memory distributed engine. Only by understanding the mechanism and principle of memory management can we optimize it better.
Content 1. Static memory management (policy prior to Spark 1.6.x)
Static memory management diagram-in-heap
Unroll source code reference: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L249
Static memory management diagram-- out of heap
2. Unified memory management (Spark strategy after 1.6.x)
Unified memory management diagram-in-heap
Unified memory management diagram-out-of-heap
The most important optimization is the dynamic occupancy mechanism. The rules are as follows: 1\ set the basic storage memory and execution memory area (spark.storage.storageFraction parameter), which determines the range of space owned by both parties. When there is insufficient space on both sides, it is stored on the hard disk. If your side is short of space and the other side is free, you can borrow the other side's space. (insufficient storage space means that there is not enough storage space to put down a complete Block) 2. After the space of the execution memory is occupied by the other party, you can allow the other party to transfer the occupied part to the hard disk, and then "return" the borrowed space. 3. After the space of the storage memory is occupied by the other party, it is impossible for the other party to "return", because many factors in the Shuffle process need to be considered, so it is more complicated to implement.
Dynamic occupancy mechanism diagram
With the unified memory management mechanism, Spark improves the utilization of in-heap and out-of-heap memory resources to a certain extent, reducing the difficulty for developers to maintain Spark memory, but it does not mean that developers can rest easy. For example, if there is too much memory space or too much cached data, it will lead to frequent full garbage collection and reduce the performance of the task execution, because the cached RDD data usually resides in memory for a long time. Therefore, in order to give full play to the performance of Spark, developers need to further understand the respective management methods and implementation principles of storage memory and execution memory.
Unified memory management can also be simplified
(1) Reserved Memory
It is not recommended to change this parameter (default)
The memory reserved by the system, starting with Spark 1.6.0, has a value of 300MB, and its size cannot be changed in any way without Spark recompiling or setting spark.testing.reservedMemory, because it is not a test parameter for production.
If memory is not reserved for at least 1.5 for the Spark executor = 450MB heap,spark-submit will fail with a * * "please use larger heap size" * * error message.
(2) User Memory
The remaining memory pool after allocating Spark Memory
Users can store data structures and Spark metadata that will be used in RDD transformations there, so they must pay attention to the definition and use of data structures in coding.
The size of the memory pool can be calculated as ("Java heap"-"reserved memory") (1.0-spark.memory.fraction), which by default equals ("Java heap"-300MB) 0.25
(3) Spark Memory
Apache Spark managed memory pool
Calculate: ("Java heap"-"reserve memory") * spark.memory.fraction, and use the Spark 1.6.0 default value it gives us ("Java heap"-300MB) * 0.75.
The boundary of Execution Memory and Storage Memory is set by spark.memory.storageFraction, which defaults to 0.5.
Storage Memory: this memory is used to cache data that will be used later: such as broadcast variables, persist
Execution Memory: this memory is used to store objects needed during the execution of Spark shuffles, joins, sorts, and aggregations.
The above content is how to implement Spark Executor memory management. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.