How to use spark memory in jvm-profiler 07/19 Update SLTechnology News&Howtos

How to use spark memory in jvm-profiler

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is to share with you about how to use spark memory in jvm-profiler, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Jvm-profiler

Generally speaking, there are two ways to monitor spark memory.

Access to the internal memory usage of Executor through Spark ListenerBus, there is less relevant information that can be obtained. After the https://github.com/apache/spark/pull/21221 is integrated, the usage of each logical partition of executor memory can be collected.

The JVM information is sent to the specified sink through Spark Metrics, and users can also customize the Sink, such as sending it to kafka/Redis.

Uber has recently opened up jvm-profiler to collect information about distributed JVM applications, which can be used for debug CPU/mem/io or the time of method calls. For example, adjust Spark JVM memory size, monitor HDFS Namenode RPC latency, and analyze data consanguinity.

It is easy to apply to Spark.

JVM information is collected every 5S and sent to kafka profiler_CpuAndMemory topic

Hdfs dfs-put jvm-profiler-0.0.9.jar hdfs://hdfs_url/lib/jvm-profiler-0.0.9.jar--conf spark.jars=hdfs://hdfs_url/lib/jvm-profiler-0.0.9.jar--conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-0.0.9.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,metricInterval=5000,brokerList=brokerhost:9092,topicPrefix=profiler_

After consumption, it is stored in HDFS for analysis.

Analysis.

Hive table structure

Analyze the task of user-defined memory

User-defined memory scheduling tasks. The memory utilization of 75% of the tasks is less than 80%, which can be optimized.

User-defined memory scheduling task

User-defined memory development tasks, 45% of the tasks memory usage is less than 20%, and users have bad usage habits.

User-defined memory development task

Summary

By collecting the maximum usage value and setting value of jvm, the following problems can be solved.

Memory abuse

Monitor application memory usage trends to prevent insufficient memory caused by data growth

Spark Executor default memory setting is unreasonable

Expected memory reduction based on application usage

The default memory of executor is reduced by 10%, which frees up 60 GB of memory per task on average.

The utilization of custom memory scheduling tasks has been increased to 70%, and an average of 450 GB of memory can be freed per task.

The utilization rate of custom memory development tasks has been increased to 70%, with an average of 550 GB of memory freed per task.

The above is how to use spark memory in jvm-profiler. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.