Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of memory leakage, performance Optimization and downtime deadlock in Java

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly shows you the "example analysis of memory leaks, performance optimization, and downtime deadlocks in Java", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let the editor lead you to study and learn the "example analysis of memory leaks, performance optimization and downtime deadlocks in Java".

Memory leak

It is very simple to troubleshoot memory leaks in C++. Use the hook function to check the memory allocation and release functions malloc and free, and count which malloc memory does not have free, you can find out the source of the memory leak. But in Java the problem is much more complicated, mainly because Java has a layer of JVM on top of memory to manage memory.

JVM first applies for large memory from the operating system, and then manages the memory itself. So there are two kinds of memory leaks in Java programs: on-heap memory leak and out-of-heap memory leak, and there are two kinds of out-of-heap memory leak: memory leak caused by Java using out-of-heap memory and memory leak caused by Java program using C++.

To analyze memory leaks, you first need to confirm whether it is an on-heap leak or an off-heap leak. You can confirm through jmap-heap pid. As shown in the following figure, the utilization rate of PS Old Generation in the old era is 99.99%. Combined with gc log, if the old age cannot be recycled, it is basically confirmed as a memory leak on the heap, and the process itself does not rule out the need for so much memory, so you need to analyze the heap. A significant manifestation of out-of-heap memory leaks is that the physical memory found by the top command is significantly larger than the maximum memory configured through xmx.

Memory leak on heap

Memory leaks on the heap are the most common. The requested object references and memory are all on the JVM heap, but after the object is used, the object references are held by other long-life objects all the time, so that they cannot be released from the heap. First, use jdk/bin/jmap-dump:live,format=b,file=heap.hprof {pid} to export all the living objects in the heap. Then use tools to analyze heap.hprof.

There are two mainstream tools for analyzing memory leaks on the heap: jvisualvm.exe in the bin directory of JDK and MemoryAnalyzer of Eclipse. MemoryAnalyzer is more powerful and can automatically analyze suspicious memory leaks. When using MemoryAnalyzer, you need to configure the maximum memory in MemoryAnalyzer.ini with the-Xmx parameter, otherwise you won't be able to open the heap. Next, we will introduce several examples of memory leaks on the heap.

Object is referenced by a static object

Using MemoryAnalyzer to automatically analyze memory leaks, as reported below, you can see that RaftServerMetrics accounts for 44.68% of memory, all instance sizes are 98m, and all RaftServerMetrics instances are referenced by one ConcurrentHashMap.

Then filter the RaftServerMetrics in the histogram and find a total of 2065 instances.

Then right-click RaftServerMetrics- > Merge shortest path to GC Roots-> with all references to find all the places that refer to RaftServerMetrics. The result is as follows. You can see that all RaftServerMetrics instances are referenced by the variable metricsMap. The problem is that after using RaftServerMetrics, it is not deleted from the static variable metricsMap.

The RPC connection is not closed after use

When MemoryAnalyzer automatically analyzes memory leaks, sometimes it can not be found accurately, so it is necessary to analyze which objects take up too much memory. The following figure shows the result of opening the heap using jvisualvm.exe. Looking at the number of objects or objects with memory exceptions, you can see that the number of objects is 111580, and the last column shows a large memory footprint. From the package analysis of objects, they are all related to netty and client-related objects. Basically confirm that these objects are related to memory leaks. Further analysis of the code shows that a large number of RPC connections are not closed after use.

Out-of-heap memory leak Java uses out-of-heap memory

JDK provides an interface to request memory directly from the operating system by bypassing JVM. For example, memory is allocated and freed directly through the allocateMemory and freeMemory of the Unsafe class. References to memory objects are on the heap, but inside and outside the heap. To troubleshoot such memory leaks, first turn on:

-XX:NativeMemoryTracking=detail

Then jcmd pid VM.native_memory detail, type out the memory allocation information, and notice that the memory displayed by NativeMemoryTracking does not contain the memory allocated by C++. There are two points to pay attention to here. First, whether the committed value of the Total line is equal to the physical memory occupied by the process. If it is different, it means there is memory allocated by native code such as C++. Please refer to the Java call C++ component for analysis; second, whether the committed value of Native Memory Tracking is too large. If it is too large, it means that Unsafe.allocateMemory allocates too much memory.

There are two usage scenarios for Unsafe.allocateMemory: first, it is encapsulated in DirectByteBuffer; second, the business uses Unsafe.allocateMemory directly.

DirectByteBuff is often used in communication frameworks such as netty to not only reduce GC pressure, but also avoid copying objects from the heap to the outside of the heap during IO operations. In order to quickly verify whether DirectByteBuffer causes memory leaks, you can use the parameter-XX:MaxDirectMemorySize to limit the amount of out-of-heap memory allocated by DirectByteBuffer. If the out-of-heap memory is still larger than MaxDirectMemorySize, memory leaks caused by DirectByteBuffer can be basically eliminated.

Analyze the memory of DirectByteBuffer first available Java Mission Control, bind to the process, and check the memory occupied by DirectByteBuffer, such as 2.24GB. You can also directly use MemoryAnalyzer to open the dump heap, count the sum of the capacity of all DirectByteBuffer, and calculate the out-of-heap memory size of the DirectByteBuffer application.

Then use the command jdk/bin/jmap-dump:live,format=b,file=heap.hprof {pid} to export all living objects in the heap, open the dump heap with MemoryAnalyzer, and analyze all DirectByteBuffe:Merge shortest path to GC Roots-> with all references.

If DirectByteBuffer is excluded, it is the memory allocated by the application directly with the allocateMemory of the Unsafe class, such as Spark's off heap memory [1]. At this point, you can troubleshoot all the Unsafe.allocateMemory of the code.

Java calls C++ component

For example, RocksDB is implemented by C++ and provides an API for Java call through JNI. If Java creates a new RocksDB instance through JNI, RocksDB will start a number of background threads to request and release memory. This part of memory is not visible to Java and cannot be analyzed by dump jvm heap in case of leakage.

The analysis tool can be gperftools of google or jemalloc. In this paper, jemalloc is used. First, install jemalloc to / usr/local/lib/libjemalloc.so.

Git clone https://github.com/jemalloc/jemalloc.gitgit checkout 5.2.1./configure-enable-prof-enable-stats-enable-debug-enable-fillmake & & make install

Then in the process startup script, add the following command, LD_PRELOAD means that when JVM requests memory, it no longer uses glibc's ptmalloc, but uses jemalloc. MALLOC_CONF 's lg_prof_interval indicates that a heap file is generated every time you apply for 2 ^ 30Byte.

Export LD_PRELOAD=/usr/local/lib/libjemalloc.soexport MALLOC_CONF=prof:true,lg_prof_interval:30

And add the parameter-XX:+PreserveFramePointer to the startup command of the process. After the process starts, as memory is constantly requested, many dump files are generated, and all dump files can be analyzed together with the command: jeprof-- show_bytes-- pdf jdk/bin/java * .heap > leak.pdf.

As shown below, you can see the path of all applied memory. The process has applied for a total of 88 GB of memory, while RocksDB has applied for 74.2% of memory, which is basically determined to be abnormal behavior. A total of 1024 new RocksDB instances have been created, each of which is running. The optimization method is to merge RocksDB instances.

It should be noted that 88GB is all the applied memory, including those that have been applied but have been released, so in most cases, the source of the leak can be determined, but it is not very accurate. The accurate method is to hook malloc and free with hook functions in C++ code to record which memory has not been released.

Performance optimization arthas

Perf is the most common performance analysis tool. In Java, Ali's tool arthas can be used to perf and generate flame diagrams. This tool can be used in docker containers, but there are many restrictions on the use of system perf commands in containers.

Download arthas-bin.zip [2], run. / a.sh, then bind to the corresponding process, start perf: profiler start, after sampling for a period of time, stop perf: profiler stop. As the result is shown below, you can see that getServiceList consumes 63.75% of CPU.

In addition, common optimization tips: hotspot functions to avoid using lambda expressions such as stream.collect, hotspot functions to avoid using regular expressions, to avoid converting UUID to String in the protocol transmission and so on.

Jaeger

Perf is suitable for finding hot functions of the whole program, but it is not suitable for analyzing the time distribution of a single RPC call, so jaeger is needed.

Jaeger is an open source Go-based distributed tracking system of Uber. The basic principle of jaeger is that users insert piles in their own code and report to the jaeger,jaeger summary process and display it in UI. Non-production environment can install jaeger-all-in-one [3], the data is in memory, there is a risk of memory overflow. In the startup script of the service that needs to be tracked, export JAEGER_AGENT_HOST= {host} where the jaeger service is located.

The following figure shows the UI of jaeger, which shows a complete process. The specific stuffing name on the left and each plug-in code on the right are time-consuming. You can see that the most time-consuming parts are including leader create container and including follower create container. This part of the semantics is that after leader creates container, two follower start to create container, while creating container is very time-consuming. If you change to leader and two follower to create container at the same time, the time will be reduced by half.

Tcpdump

Tcpdump is often used to capture packet analysis, but it can also be used to optimize performance. In our scenario, the Ozone cluster (the next generation distributed object storage system) is deployed and the data is read. It is found that the larger the file is, the slower the reading speed is. The reading speed of 1G file is only 2.2m per second, and no clue is found using perf.

With the command tcpdump-I eth0-s 0-A 'tcp dst port 9878 and tcp [(tcp [12:1] & 0xf0) > 2): 4] = 0x47455420'-w read.cap, this command exports all GET requests to an read.cap file when reading a 200m file, then opens the read.cap with wireshark, and filters out the HTTP protocol, because most of the protocols are TCP protocols for transferring data, while the HTTP protocol is used for request start and end.

From the wireshark interface of the following figure, you can see that there are 10 GET requests for reading 200m files: GET / goofys-bucket/test.dbf HTTP/1.1, each GET request to read 20m files, and each GET request to reply: HTTP/1.1 200m OK. The first GET request arrives at S3gateway in 0.2287 seconds, and the 10th GET request arrives in Ozone cluster in 1.026458 seconds. The first GET request has a completion time of 1.869579 seconds and the 10th GET request has a completion time of 23.640925 seconds.

It can be seen that all 10 GET requests reach the Ozone cluster in 1 second, but each request takes longer and longer. So you only need to analyze what subsequent GET requests read blocks of the same size and do more than the previous GET requests.

Finally, by analyzing the log and reading the code, it is found that the third-party library commons-io adopted by Ozone uses read to implement skip. For example, when reading the 10th GET request, you only need to read [180m, 200m), but when commons-io implements the first 180m of skip, it will read the first 180m, causing the 10th GET request to read the full [0m, 200m), so the GET request is getting slower and slower. After optimization, the performance is improved a hundred times.

Jstack

Jstack is used to query thread status, but in extreme cases it can also be used for performance optimization. When deploying the service, the machine process that found that the process quickly occupied all CPU,24 cores used 2381% of the CPU.

CPU usage is so high that it is impossible to run arthas for perf analysis and can only use other strategies. First, use the top-Hp pid command to type out all threads of the process pid and the CPU consumption of each thread. As shown in the figure below, the first column PID is the thread number, and the% CPU column represents CPU consumption. Note that the figure is only for demonstration. The process in this figure is not a process that uses CPU to reach 2381%. The information of the original process was not saved at the beginning.

Then calculate the hexadecimal representation of 0x417 using the highest thread number of CPU, and type out all thread states with the command jstack-l pid > jstack.txt. Use 0x417 to query the thread with the highest consumption of CPU in jstack.txt, that is, the thread in ThreadPoolExecutor shown in the following figure. The thread has been in RUNNABLE, and the queue is listed as empty, basically confirming that there is something wrong with this part of thread, because the normal thread will not be idle all the time, and the state will have a moment of TIMED_WAITING.

Because the thread stack does not contain the business code, it is the source code of JDK, so use the thread stack to search for JDK-related problems, and finally find that it is the Bug:JDK-8129861 of JDK8, which is easy to trigger when creating a thread pool of size 0, so you can modify the thread pool of size 0 in the application code.

Downtime was killed by other processes.

The process that occurred in the production environment was killed by the cleanup script. There are two troubleshooting tools: auditd and systemtap that come with linux.

Use auditd first because the tool is easy to use and does not need to be installed. Use service auditd status to check the status of the service, and if the available service auditd restart startup is not started. Then use the command: auditctl-an exit,always-F arch=b64-S kill to listen for all Kill signals. As shown in the following figure, you can see in the type=OBJ_PID line: the process number opid=40442 of the captured Kill signal kill, the thread name ocomm= "rocksdb:pst_st", notice the thread name typed here instead of the process name.

As you can see from the type=SYSCALL line: A1: 9 means kill-9; the process that issues kill-9 is exe= "/ usr/bin/bash" and the process number is pid=98003. The corresponding process cannot be found from this information, because scripts tend to stop after running, and the life cycle is very short.

Next, using systemtap analysis, systemtap needs to be installed: yum install systemtap systemtap-runtime. First write the systemtap script findkiller.stp, as shown below, which captures the KILL signal of the kill process sig_pid and uses task_ancestry to print all the ancestor processes of the process that sent the KILL signal.

Probe signal.send {if (sig_name = = "SIGKILL" & & sig_pid = = target ()) {printf ("% s,% s was sent to% s (pid:%d) by% s (pid:%d) uid:% d\ n", ctime (gettimeofday_s ()), sig_name, pid_name, sig_pid, execname (), pid (), uid ()); printf ("parent of sender:% s (% d)\ n", pexecname (), ppid ()) Printf ("task_ancestry:%s\ n", task_ancestry (pid2task (pid (), 1));}}

Then stap-p4 findkiller.stp generates the ko file: stap_XX.ko, and some machines need to sign the ko file to run. Then run: nohup staprun-x 98120 stap_XX.ko > nohup.out 2 > & 1 &, where 98120 is the target () in the script.

The capture results are as follows, and you can see from the figure that the process that issued the KILL command was started through crond, that is, the scheduled task runs some scripts to kill the process. But we still don't know which script the scheduled task started to kill the process.

Next, use auditd troubleshooting, using the command: auditctl-an exit,always-F arch=b64-S execve to capture all system calls, the result is as follows, the last line is to capture the opid=20286 signal of the killing process, you can see from the figure that the / data/tools/clean command appears near the kill signal.

Several scripts are called in / data/tools/clean, and each script is typed into crontab.pid with the current script name and process number. And compared with the process number 62118 captured by systemtap, it is found that the KILL signal is sent from the kill_non_run_app.sh script.

Call exit of System

If you explicitly call System.exit to end the process in a Java program, you can use arthas to troubleshoot. First, write the script system_exit.as as follows.

Options unsafe truestack java.lang.System exit-n 1

Run the command nohup. / as.sh-f system_exit.as 69001-b > system_exit.out 2 > & 1 & to monitor all System.exit invoked by process 69001.

Crash occurs in C++ called by Java

The Crash case that occurs here is the same as the reason for Crash in Java below, but the phenomenon is different. In most cases, it is Crash in C++ code that only generates core files and does not generate Crash log; of Crash in Java. In a few cases, Crash generates Crash log of Crash in Java in JVM.

If Java calls C++ code through JNI, Crash,JVM sometimes exits without generating any information in C++. In this case, the reason for the exit process is analyzed with the help of the core file generated by the operating system, but the operating system turns off the feature by default. As shown in the following figure, a core file size of 0 means to disable the feature.

Therefore, you need to set the size of core file by setting ulimit-c ulimited in the startup script of the process (which only affects the current process). After starting the process, open / proc/ {pid} / limits and check the size of Max core file size to make sure it is enabled.

When Crash occurs, a core.pid file is generated. Generally, the core.pid file is very large because it contains all the virtual memory size, so it is larger than the physical memory, as shown in the following figure, the core.44729 has a total 53GB.

Next, use the command gdb bin/java core.44729 to open the core file, and found that it was hung when rocksdb start thread, hanging in libstdc++, this is the glibc library, it is almost impossible to have a problem, so the stack may be appearance, there are other reasons leading to start thread failure.

Notice that when opening the core file, there are too many threads-LWP lightweight processes.

Then using info threads in gdb, I found that there are more than 30,000 threads in the wait lock state, basically confirming more than 30,000 threads, resulting in too much memory to create new threads, so they are hung in start thread.

Then analyze what are more than 30, 000 threads, randomly select dozens of threads, hit the stack of each thread, you can see that most of the threads are jvm threads. Because the thread created by rocksdb is:

From / tmp/librocksdbjni8646115773822033422.so; while the threads created by jvm are from / usr/java/jdk1.8.0_191-amd64/jre/lib/amd64/server/libjvm.so, these threads account for the majority.

Therefore, the problem lies in the Java code, the process that generated the core.pid file, although it did not produce crash log, but it was also because there were too many Java threads, which caused the C++ code to hang when creating threads. For why there are too many Java threads, see Crash in Java.

In addition, core.pid completely retains the scene of C++ component Crash, including variables, register values, etc., if it is really because C++ component has Bug and Crash, such as null pointer. First of all, find the C++ source code by yourself, and find out the variables {variableName} that suspect null pointers. By executing the command in gdb: P {variableName}, you can see the value of each variable, thus finding out the variables with null pointers. Crash in Java

To troubleshoot the causes of Crash in Java, such as OOM, you need to configure the following parameters of JVM:

-XX:ErrorFile-XX:+HeapDumpOnOutOfMemoryError-XX:HeapDumpPath.

When Crash occurs within a JVM, a crash log is generated under the path configured by-XX:ErrorFile. -XX:+HeapDumpOnOutOfMemoryError and-XX:HeapDumpPath are used to generate a Dump heap when OOM occurs and to restore the site. The following figure shows the resulting crash log. You can see that OutOfMemory caused the process to hang when the thread was created.

From the figure below, crash log, you can see that there are 24, 000 Datanode State Machine Thread threads waiting for a lock. It is confirmed that the above Java call C++ caused the process of Crash to generate core.pid and the process of generating crash log because more than 20, 000 Datanode State Machine Thread failed.

Then analyze why there are more than 20, 000 Datanode State Machine Thread, and you can see in the code that the thread is created with the thread pool newCacheThreadPool. The newCacheThreadPool creates new threads when no threads are available, such as when all threads are waiting for a lock, so more than 20, 000 threads are created. Then analyze the locks of Datanode State Machine Thread and so on. When the number of threads in a process exceeds 5000, type out the status of all threads with jstack-l pid > jstack.txt.

You can see that almost all the Datanode State Machine Thread are waiting for the lock, and only one Datanode State Machine Thread-5500 gets the lock, but the card is submitting the RPC request submitRequest. At this point, the reason for the occurrence of Crash and Crash in Java when Java calls C++ is found.

A deadlock caused by log4j.

The deadlock information printed by jstack is shown below. The grpc-default-executor-14765 thread gets the log4j lock and is waiting for the RaftServerImpl lock; the grpc-default-executor-14776 thread gets the RaftServerImpl lock and waits for the log4j lock, causing both threads to get the lock the other is waiting for, resulting in two thread deadlocks. As you can see, only log log4j, do not release the lock is the most suspicious place. Finally, it is found that log4j has the defect of deadlock. The flaw is resolved in log4j2, and you can upgrade log4j.

Deadlock caused by improper packing

The deadlock information printed by jstack is shown below. The grpc-default-executor-3449 thread gets RaftLog's lock and waits for DataBlockingQueue's lock; SegmentedRaftLogWorker gets DataBlockingQueue's lock and waits for RaftLog's lock.

The most suspicious thing here is that SegmentedRaftLogWorker gets the lock of DataBlockingQueue but does not release it, because the operation of queue only adds, deletes, and queries elements in the queue. DataBlockingQueue's method poll is shown in the following figure. The lock used is the self-encapsulated lock AutoCloseableLock implement AutoCloseable, and the release of the lock depends on the AutoCloseableLock overloaded close method.

Take a look at the acquire method, first use lock.lock () to get the lock, and then create a new AutoCloseableLock object. If an exception such as OOM occurs when the new object AutoCloseableLock is created after the lock is acquired, the lock cannot be released.

These are all the contents of the article "example Analysis of memory leaks, performance Optimization, and downtime deadlocks in Java". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report