What is the common parameter tuning method of JVM 07/13 Update SLTechnology News&Howtos

What is the common parameter tuning method of JVM

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what is the common parameter tuning method of JVM". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

When it comes to tuning, there are usually three processes:

Performance monitoring: the problem didn't happen, and you don't know what you need to tune. At this time, some monitoring tools of systems and applications are needed to find the problem.

Performance analysis: the problem has occurred, but you don't know what the problem is. At this point, we need to use tools and experience to analyze the bottleneck of the system and application in order to locate the cause of the problem.

Performance tuning: after the previous step of analysis to locate the problem, need to solve the problem, the use of code, configuration and other means to optimize.

Java tuning is nothing more than these three steps.

In addition, the performance analysis and tuning discussed in this paper put aside the following factors:

System underlying environment: hardware, operating system, etc.

Use of data structures and algorithms

Use of external systems such as databases and caches

Tuning preparation

Tuning needs to be prepared. After all, each application has different business goals and performance bottlenecks are not always at the same point. At the business application level, we need to:

Need to understand the overall structure of the system, a clear direction of pressure. For example, which interface and module of the system has the highest utilization rate and faces the challenge of high concurrency.

You need to build a test environment to test the performance of the application, using ab, loadrunner, or jmeter.

The analysis of the amount of key business data, here mainly refers to the quantitative analysis of some data, such as how much data the database has in a day, how much data is cached, and so on.

Understand the response speed, throughput, TPS, QPS and other indicators of the system requirements, such as the second kill system for response speed and QPS requirements are very high.

To understand the version, mode and parameters of the software related to the system, sometimes limited to the version and pattern of the application dependent service, the performance will also be affected to a certain extent.

Performance analysis.

At the system level, there are generally three factors that can affect application performance: CPU, memory and IO, from which the performance bottleneck of the program can be analyzed.

CPU analysis

When the program response slows down, first use top, vmstat, ps and other commands to check whether the cpu utilization of the system is abnormal, so that you can judge whether it is the performance problem caused by the busy cpu. Among them, the abnormal process information is mainly seen through the data of us (% of user processes). When the us is close to 100% or higher, it can be determined that the slow response is caused by the busy cpu. Generally speaking, there are several reasons why cpu is busy:

There are infinite empty loops, non-blocking, regular matching, or simple computation in threads

Frequent gc occurs

Context switching of multithreading

After determining the process with the highest cpu usage, you can use jstack to print out the stack information of the abnormal process:

Jstack [pid]

The next point to note is that all threads under Linux eventually exist in the system in the form of lightweight processes, and using jstack can only print out process information, which contains stack information for all threads under this process (lightweight process-LWP). Therefore, it is further necessary to determine which thread consumes a lot of CPU, which can be viewed using top-p [processId]-H, or directly through ps-Le to display resource consumption information for all processes, including LWP. Finally, you can locate the corresponding stack information by looking for the id of the corresponding LWP in the output file of jstack. One thing to pay attention to is the state of the thread: RUNNABLE, WAITING, and so on. For the Runnable process, you need to pay attention to whether there are calculations that consume cpu. For Waiting threads, it is generally a lock wait operation.

You can also use jstat to view the gc information of the corresponding process to determine if gc is causing the cpu to be busy.

Jstat-gcutil [pid]

You can also use vmstat to determine whether cpu is busy due to context switching by observing the number of context switches (cs) in the kernel state.

Vmstat 1 5

In addition, jit may sometimes cause high cpu situations, such as a large number of method compilations, and so on. You can use the parameter-XX:+PrintCompilation to output jit compilation to troubleshoot cpu problems caused by jit compilation.

Memory analysis

For Java applications, memory is mainly composed of out-of-heap memory and in-heap memory.

Out-of-heap memory

Out-of-heap memory is mainly used by JNI, Deflater/Inflater, DirectByteBuffer (which will be used in nio). For the analysis of this kind of out-of-heap memory, you still need to check the consumption of swap and physical memory through vmstat, sar, top, pidstat (here sar,pidstat and iostat are all part of the sysstat software suite and need to be installed separately). In addition, calls like JNI and Deflater can be used to track resource usage through Google-preftools.

In-heap memory

This part of the memory is the main memory area of Java applications. Usually related to this part of memory performance are:

Improper use of the above can easily lead to:

The common tool for troubleshooting heap memory problems is jmap, which comes with jdk. Some common uses are as follows:

In addition, whether you use jmap or the dump file generated when you use OOM, you can use Eclipse's MAT (MEMORY ANALYZER TOOL) to analyze it, and you can see the information about specific objects on the stack and in memory. Of course, jdk's own jhat can also view dump files (launch the web port for developers to use browsers to browse information about objects in the heap). In addition, VisualVM can also open the hprof file and use its heap walker to view heap memory information.

View jvm memory usage: jmap-heap

View jvm memory surviving objects: jmap-histo:live

Dump all the objects in the heap, dead or alive: jmap-dump:format=b,file=xxx.hprof

First do a full GC, and then dump, which contains only the information of the objects that are still alive: jmap-dump:format=b,live,file=xxx.hprof

Heap space: out of heap memory

PermGen space: out of memory for permanent generation

Native thread: local thread does not have enough memory to allocate

Frequent GC-> Stop the world that slows down your application response

OOM, which directly causes the memory overflow error to cause the program to exit. OOM can be divided into the following categories:

Created objects: this is stored in the heap, need to control the number and size of objects, especially large objects are easy to enter the old age

Global collections: global collections usually have a long life cycle, so you need to pay special attention to the use of global collections

Cache: the cache chooses different data structures, which will greatly affect the memory size and gc.

ClassLoader: the main reason is that dynamically loading classes can easily cause permanent memory shortages.

Multithreading: thread allocation will take up local memory, and too many threads will cause insufficient memory.

IO analysis

Usually related to application performance include: file IO and network IO.

File IO

You can use the system tools pidstat, iostat, and vmstat to view the status of io. Here you can see a picture of the results of using vmstat.

Here, we mainly pay attention to the two values of bi and bo, which represent the number of blocks received by the block device per second and the number of blocks sent by the block device per second, respectively, from which the io busy condition can be determined. Further, you can locate the system call to the file io by using the strace tool. In general, the reasons for the poor performance of file io are as follows:

A large number of random reads and writes

Slow equipment

The file is too large

Network IO

To check the status of the network io, you usually use the netstat tool. You can view the status, number, port information, and so on of all connections. For example, when there are too many time_wait or close_wait connections, it will affect the corresponding speed of the application.

Netstat-anp

In addition, you can also use tcpdump to analyze the data of the network io. Of course, the direct opening of the file out of tcpdump is a pile of binary data, and you can use wireshark to read the specific connection and the contents of the data in it.

Tcpdump-I eth0-w tmp.cap-tnn dst port 8080 # listens for network requests on port 8080 and prints logs to tmp.cap

You can also check / proc/interrupts to get information about the interrupts currently used by the system.

The columns are in turn:

The serial number of the irq, the number of interrupts that occurred on the respective cpu, programmable interrupt controller, device name (dev_name field of request_irq)

The condition of the network io can be judged by looking at the terminal condition of the network card device.

Other analytical tools

The above describes some of the analysis tools that come with the system / JDK for CPU, memory and IO, respectively. In addition, there are some comprehensive analysis tools or frameworks that can make it more convenient for us to check, analyze and locate the performance of Java applications.

VisualVM

This tool should be a java application monitoring tool that Java developers are very familiar with, the principle is to connect jvm processes through the jmx interface, so that you can see the thread, memory, class and other information on the jvm. If you want to see more about gc, you can install the visual gc plug-in. In addition, visualvm also has a plug-in for btrace, which can visually write btrace code and view the output log. Similar to VisualVm, jconsole is also a tool to view remote jvm information through jmx. Furthermore, it can display specific thread stack information and the occupancy of each age in memory. It also supports direct remote execution of MBEAN. Of course, visualvm can also have these features by installing the jconsole plug-in. However, because both of these tools require a ui interface, they usually connect to the server jvm process remotely locally. In a server environment, this approach is generally not used.

Java Mission Control (jmc)

This tool was originally shipped with jdk7 U40 and turned out to be a tool on JRockit. It is a very powerful sampling tool that integrates diagnosis, analysis and monitoring: https://docs.oracle.com/javacomponents/jmc-5-5/jmc-user-guide/toc.htm. But this tool is based on JFR (jcmdJFR.start name=test duration=60s settings=template.jfc filename=output.jfr), and a commercial certificate is required to open JFR: jcmdVM.unlock_commercial_features.

Btrace

What has to be mentioned here is the artifact btrace, which uses java attach api+ java agent + instrument api to achieve dynamic tracking of jvm. You can add methods of intercepting classes to print logs without restarting the application. For specific usage, you can refer to the complete guide for getting started with Btrace to skilled workers.

Jwebap

Jwebap is a JavaEE performance testing framework based on asm enhanced bytecode implementation. Support: http request, jdbc connection, call trace of method, and statistics of times and time consuming. This allows you to get the most time-consuming requests, methods, and check the number of jdbc connections, whether they are closed, and so on. However, this project is a project in 2006 and has not been updated for nearly 10 years. According to the author's use, jdk7 compilation is no longer supported. If you want to use it, it is recommended to redevelop based on the original project, and you can also add trajectory tracking to the redis connection. Of course, based on the principle of bytecode enhancement, we can also implement our own JavaEE performance monitoring framework.

The above picture is from jwebap, which has been developed by the author's company, and already supports jdk8 and redis connection tracking.

Useful-scripts

Here is an open source project that I am involved in: https://github.com/superhj1987/useful-scripts, which encapsulates a lot of commonly used performance analysis commands, such as printing busy java thread stack information mentioned above, and only needs to execute a script.

Performance tuning

Corresponding to the performance analysis, performance tuning is also divided into three parts.

CPU tuning

Instead of having threads running all the time (infinite while loops), you can use sleep to sleep for a while. This situation is common in some scenarios where data is consumed by pull. When the pull does not get the data, it is recommended to sleep it and do the next pull.

You can use the wait/notify mechanism when polling

Avoid loops, regular expression matching, too much calculation, including the use of String's format, split, replace methods (you can use the corresponding method of StringUtils in apache's commons-lang), the use of rules to determine mailbox format (sometimes resulting in an endless loop), sequence / deserialization, and so on.

Combine jvm with code to avoid generating frequent gc, especially full GC.

In addition, when using multithreading, you need to pay attention to the following points:

Use thread pools to reduce the number of threads and thread switching

Multithreading for lock competition can consider reducing the lock granularity (using ReetrantLock), splitting locks (similar to ConcurrentHashMap bucket locking), or using lock-free techniques such as CAS, ThreadLocal, immutable objects and so on. In addition, multithreaded code is best written using the concurrent packages provided by jdk, the Executors framework, ForkJoin, and so on, and Discuptor and Actor can also be used in appropriate scenarios.

Memory tuning

The tuning of memory is mainly about tuning jvm.

Set the size of each generation reasonably. Avoid new generation settings that are too small (not enough, often minor gc and enter the old age) and too large (fragmentation), as well as too large and too small Survivor settings.

Choose the appropriate GC strategy. You need to choose the appropriate gc strategy according to different scenarios. What needs to be said here is that cms is not omnipotent. Unless there is a special need to re-set, after all, cms's new generation of recycling strategy parnew is not the fastest, and cms will produce fragments. In addition, G1 was not widely used until the advent of jdk8 and is not recommended.

Configure jvm startup parameters-XX:+PrintGCDetails-XX:+PrintGCDateStamps-Xloggc: [log _ path] to record gc logs to facilitate troubleshooting.

Among them, for the first point, there is a specific suggestion:

Younger generation size selection: applications with priority response time should be set as large as possible until they are close to the minimum response time limit of the system (selected according to the actual situation). In this case, the frequency of gc in the younger generation is the lowest. At the same time, it can also reduce the number of people who reach the older generation. Applications with priority to throughput should also be set as large as possible, because there is no requirement for response time, and garbage collection can be carried out in parallel. It is recommended to be suitable for applications above 8CPU.

Older generation size selection: for applications with priority response time, the older generation generally uses concurrent collectors, so its size needs to be set carefully, and some parameters such as concurrent session rate and session duration should be considered. If the heap setting is small, it will result in memory fragmentation, high recycling frequency, and application pauses and use the traditional flag cleanup method; if the heap is large, it will take a long time to collect. For the most optimized scheme, you generally need to refer to the following data:

Generally speaking, applications with priority to throughput should have a large younger generation and a younger older generation. In this way, most of the short-term objects can be recycled as much as possible, reducing the number of medium-term objects, while the older generation can store long-term living objects.

Concurrent garbage collection information

Number of concurrent collections of persistent generations

Traditional GC information

The proportion of time spent on recycling between the younger generation and the older generation

In addition, fragmentation problems caused by smaller heaps: because the older generation of concurrent collectors use marking and clearing algorithms, the heap is not compressed. When the collector collects, the adjacent spaces are merged so that they can be allocated to larger objects. However, when the heap space is small, after running for a period of time, there will be "fragments". If the concurrent collector cannot find enough space, then the concurrent collector will stop and then recycle it using the traditional marking and clearing methods. If "fragmentation" occurs, you may need to configure the following:-XX:+UseCMSCompactAtFullCollection, when using the concurrent collector, turn on the compression of the older generation. Compress the older generation after setting FullGC with-XX:CMSFullGCsBeforeCompaction=xx at the same time.

The rest of the optimization problems for jvm can be found in the JVM parameter advance section below.

In the code, it is also important to note:

Avoid saving duplicate String objects, but also be careful with the use of String.subString () and String.intern (), especially the latter whose underlying data structure is StringTable, which makes the StringTable very large (a fixed-size hashmap that can be set by the parameter-XX:StringTableSize=N) when a large number of strings are not repeated, thus affecting the speed of the young gc. Using this method in jackson and fastjson can cause gc problems in some scenarios: YGC is getting slower and slower, and why.

Try not to use finalizer

Release unnecessary references: remember to release ThreadLocal after use to prevent memory leaks, and remember close after using various stream.

Use object pooling to avoid uncontrolled creation of objects, resulting in frequent gc. But don't use object pooling casually unless you initialize / create resource-intensive scenarios such as connection pooling and thread pooling

Cache invalidation algorithm. Consider using SoftReference and WeakReference to save cache objects.

Be cautious about the use of hot deployment / loading, especially dynamically loading classes, etc.

Do not use Log4j to output file names and line numbers, because Log4j is implemented through the print thread stack, generating a large number of String. In addition, when using log4j, it is recommended that you first determine whether the log at the corresponding level is open, and then do the operation, otherwise a large number of String will be generated.

If (logger.isInfoEnabled ()) {logger.info (msg);} IO tuning

Note on the file IO:

Consider using asynchronous writing instead of synchronous writing, you can learn from redis's aof mechanism.

Use caching to reduce random reads

Write in batches as much as possible to reduce io times and addressing

Using a database instead of file storage

Note on the network IO:

Similar to file IO, asynchronous IO and multiplexed IO/ event-driven IO are used instead of synchronous blocking IO

Carry out network IO in batch to reduce the number of IO

Use caching to reduce the reading of network data

Use the cooperative program: Quasar

Other optimization suggestions

Algorithm and logic are the most important part of program performance. If you encounter performance problems, you should first optimize the logic processing of the program.

Priority is given to using return values rather than exceptions to indicate errors

Check to see if your code is inline-friendly: is your Java code JIT-friendly?

In addition, jdk7, 8 have made some enhancements to the performance of jvm:

Multi-tier compilation (tiered compilation) support for JDK7 is enabled with-XX:+TieredCompilation. Multi-tier compilation combines the advantages of client-side C1 compiler and server-side C2 compiler (client-side compilation can be quickly started and optimized in time, and server-side compilation can provide more advanced optimization). It is a very efficient aspect scheme for the use of resources. At the beginning, the low-level compilation is carried out, and the information is collected at the same time, and then the high-level compilation is further optimized at a later stage. One thing to note: this parameter consumes more memory resources, because the same method has been compiled many times and there are multiple copies of native memory, so it is recommended to increase the code cache a little bit (- XX:+ReservedCodeCacheSize,InitialCodeCacheSize). Otherwise, it is possible that due to insufficient code cache, jit compiles constantly trying to clean up code cache, discarding useless methods, and consuming a lot of resources on JIT threads.

Compressed Oops: the compression pointer is enabled by default in server mode in jdk7.

Zero-Based Compressed Ordinary Object Pointers: when the above compression pointer is used, on 64-bit jvm, the operating system is required to retain memory starting at a virtual address of 0. If the operating system supports such requests, then Zero-Based Compressed Oops is turned on. This makes it possible to decode the offset of a 32-bit object into a 64-bit pointer without adding any address supplement to the base address of the java heap.

Escape analysis (Escape Analysis): the compiler of the Server pattern determines the escape type of the relevant object based on the condition of the code, thus deciding whether to allocate space in the heap and whether to do scalar substitution (assign atomic type local variables on the stack). In addition, depending on the invocation, you can decide whether to automatically eliminate synchronization control, such as StringBuffer. This feature has been enabled by default since Java SE 6u23.

NUMA Collector Enhancements: this is important for the The Parallel Scavenger garbage collector. Enable it to take advantage of the NUMA (Non Uniform Memory Access, that is, each processor core has local memory, low latency, high bandwidth access) architecture of the machine to faster gc. Support can be enabled through-XX:+UseNUMA.

In addition, there are many outdated suggestions online, so don't follow them blindly:

Variable exhaustion is set to null to speed up memory reclamation, which makes no sense in most cases. Except in one case: if there is a Java method that is not compiled by JIT but there is still code in it that will execute for a long time, it is advisable to explicitly null unwanted reference type local variables before that piece of code will be executed for a long time. For more details, you can see the explanation of R: https://www.zhihu.com/question/48059457/answer/113538171.

The method parameter is set to final, and this usage doesn't make much sense, especially when effective final is introduced into jdk8, which automatically recognizes the final variable.

JVM parameter advance

The parameter setting of jvm has always been unclear. Most of the time, it is not clear which parameters can be configured, what the parameters mean, why they should be so configured, and so on. Here mainly for these to do some common sense explanation and some easy to let people into the trap to do some explanation.

All of the following are for Oracle/Sun JDK 6

Startup parameter default value

Java has a lot of startup parameters, and many versions are different. But now the Internet is full of all kinds of information, if you do not identify all the use, many are ineffective or are the default value. In general, we can view all the parameters that can be set and their default values by using java-XX:+PrintFlagsInitial. You can also add-XX:+PrintCommandLineFlags when the program starts to view startup parameters that are different from the default values. If you want to see all the startup parameters, including those that are the same as the default values, you can use-XX:+PrintFlagsFinal.

The "=" in the output indicates that the initial default value is used, while the ": =" indicates that the initial default value is not used. It may be a parameter passed in the command line, a parameter in the configuration file, or a different value automatically selected by ergonomics.

In addition, you can use the jinfo command to display the startup parameters.

It should be pointed out here that when you configure the jvm parameter, it is best to check the default value of the corresponding parameter through the above command before determining whether it needs to be set. It is also best not to configure parameters that you do not understand the use of, after all, the setting of the default value is reasonable.

Jinfo-flags [pid] # View the valid parameters currently used for startup

Jinfo-flag [flagName] [pid] # View the value of the corresponding parameter

Set parameters dynamically

When the Java application is launched, it is located that the performance problem is caused by GC, but you do not add the parameter to print gc when you start it. Most of the time, you just add the parameters and restart the application. However, this will make the service unavailable for a certain period of time. The best practice is to be able to set parameters dynamically without restarting the application. You can do this with jinfo (essentially based on jmx).

Jinfo-flag [+ / -] [flagName] [pid] # enable / disable a parameter jinfo-flag [flagName=value] [pid] # set a parameter

In the case of the gc above, you can open the heap dump and set the dump path using the following command.

Jinfo-flag + HeapDumpBeforeFullGC [pid] jinfo-flag + HeapDumpAfterFullGC [pid] jinfo-flag HeapDumpPath=/home/dump/dir [pid]

Similarly, it can be turned off dynamically.

Jinfo-flag-HeapDumpBeforeFullGC [pid] jinfo-flag-HeapDumpAfterFullGC [pid]

Other parameter settings are similar.

-verbose:gc and-XX:+PrintGCDetails

Many gc recommended settings set these two parameters at the same time. In fact, as long as-XX:+PrintGCDetails is turned on, the previous options will be turned on at the same time, so there is no need to repeat the settings.

-XX:+DisableExplicitGC

The function of this parameter is to turn system.gc into air conditioner, and many recommended settings are turned on. However, if you use NIO or other situations that use out-of-heap memory, using this option will result in oom. You can use XX:+ExplicitGCInvokesConcurrent or XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses (used with CMS so that system.gc triggers a concurrent gc) instead.

In addition, there is another interesting place. If you do not set this option, you will periodically full gc when you use RMI. This phenomenon is caused by distributed gc, serving RMI.

This parameter is the upper limit of the set out-of-heap memory. Is-1 when not set, and this value is-Xmx minus the reserved size of a survivor space.

Due to legacy reasons, parameters that function the same

-Xss and-XX:ThreadStackSize

-Xmn and-XX:NewSize, in addition, it should be noted here that if-Xmn is set, NewRatio will not work.

-XX:MaxTenuringThreshold

Use the tool to see that the default value is 15, but when CMS is selected, the value becomes 4. When this value is set to 0, all living objects in eden will be promoted to old gen,survivor space directly when they experience the first minor GC. It is also worth noting that this value has no effect when using the parallel collector, which automatically adjusts these parameters to maximize throughput by default. In addition, even with recyclers such as CMS, the age promoted to the old age is not unchanged, and when the size of the object of an age reaches 50% of that of the younger generation, the age will be dynamically adjusted to the promotion age.

-XX:HeapDumpPath

Use this parameter to specify the location where the-XX:+HeapDumpBeforeFullGC,-XX:+HeapDumpAfterFullGC,-XX:+HeapDumpOnOutOfMemoryError trigger heap dump files are stored.

-XX:+UseAdaptiveSizePolicy

This parameter is enabled by default when parallel recyclers are used, and it will adjust itself according to the running status of the application, such as MaxTenuringThreshold, survivor area size, and so on. The age of the first promotion begins with InitialTenuringThreshold (the default is 7), which will be automatically adjusted later. If you want to track the threshold of the new survival cycle after each minor GC, you can add:-XX:+PrintTenuringDistribution to the startup parameter. If you want to be able to configure these parameters, you can turn this option off, but it is difficult for paralle to achieve the best performance. Other garbage collection periods are careful to turn on this switch.

This is the end of the content of "what are the common parameter tuning methods for JVM". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.