JVM fast learning 07/07 Update SLTechnology News&Howtos

JVM fast learning

2025-07-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

First of all, through the data type to introduce a high-level language core concepts, heap and stack. The basic types of JAVA include: byte, short, int, long, returnAddress, etc., which are stored on the stack; reference types include: class type, interface type, and array, which are stored on the heap. In java, a thread has a corresponding thread stack, and the heap is shared by all threads. The stack is the running unit, so the information stored in it is related to the current thread, including local variables, program running state, method return value, etc., while the heap is only responsible for storing object information.

There are the following reasons for separating the pair from the stack: the stack represents logical processing, while the heap represents data, which satisfies the idea of divide and conquer; the content in the heap can be shared by multiple stacks, which provides a way to exchange data and saves space; makes it possible for the storage address to grow dynamically, and only one address in the stack needs to be recorded in the corresponding stack. In the interpretation of object-oriented, the attribute of the object is data, which is stored in the heap, and the behavior of the object is running logic, which is put on the stack; in the heap and stack, the stack is the most fundamental thing for the program to run, and the program can run without a heap, but not without a stack, and the heap serves the stack for data storage, that is, a shared memory, which makes garbage collection possible.

The size of the Java object: the size of an empty Object object is 8byte and its address space 4byte (32 bits). For example, for the base type int, its wrapper type Integer size is 8+4=12byte, but because the java object size requires multiples of 8byte, it is 16byte, so the wrapper type consumes twice as much as the base type.

Strong reference, soft reference, weak reference and virtual reference: strong reference is a reference generated by a general virtual machine, which is strictly used by the virtual machine to determine whether it needs to be reclaimed; soft reference is generally used as a cache and will be recycled when memory is tight; weak reference is also used as a cache, but it will be garbage collected.

You can have a general understanding of the composition of JVM through the figure below.

Class Loader: loads a large Class file, the format of which is specified by "JVM Specification" and includes metadata information such as parent class, interface, version, field, method, etc.

Execution Engine: the execution engine, also called an interpreter, is responsible for interpreting commands and submitting OS for execution. The so-called JIT refers to the conversion of intermediate language bytecode to the target file obj in advance.

Native Interface: in order to integrate different languages, java has opened up an area for processing code marked native, which is rarely used these days.

Runtime data area: running the datazone is the focus of JVM, where the written program is loaded.

In addition, the registers of jvm include: pc,java program counter; optop, a pointer to the top of the Operand stack; frame, an execution environment pointer to the current execution method; and vars, a pointer to the first variable in the local variable area of the current execution method.

JVM memory management, all the data are placed in the running data area, and then introduce the most complex stack (Stack), also known as stack memory, is the running area of java programs, created when the thread is created, its life cycle follows the life cycle of the thread, the stack memory is released when the thread ends, and there is no garbage collection for the stack. The data in the stack is stored as a Stack Frame, which is a memory block and a dataset about method and runtime data. When method An is called, a stack frame F1 is generated and pressed into the stack, and method A calls method B, so the resulting stack frame F2 is also pressed into the stack. After execution, F2 is popped up first, and then F1 is popped up, following the principle of "in first and then out". The general structure of JAVA Stack is shown below.

There are many ways to classify recycling algorithms, and then give a brief introduction to them through a table.

Algorithm category

Principle exposition

According to the basic recycling strategy

Reference count (Reference Counting)

For an object, each reference is added, that is, a count is increased, and a count is reduced by deleting one. Only objects with a count of 0 are collected during garbage collection. The disadvantage is that circular references cannot be handled.

Mark-clear (Mark-Sweep)

It is divided into two phases, first identifying all referenced objects from the reference root node, and then traversing the entire heap to delete untagged objects. This algorithm needs to pause the entire application and generate memory fragmentation.

Copy (Copying)

Divide the memory space into two equal areas, one at a time, when garbage collection, traverse the current use area and assign the object in use to another area, which is less expensive. And can clean up the memory, but the disadvantage is that it needs twice as much memory space.

Marking-finishing (Mark-Compact)

The algorithm combines the advantages of "mark-clear" and "copy". In the first stage, the objects are marked from the root node, and in the second stage, the whole heap is traversed, the unlabeled objects are cleared and the surviving objects are compressed into one of the heap, and discharged sequentially. Solve debris and space problems at the same time.

Divided according to the way it is treated by regions.

Incremental collection (Incremental Collecting)

Real-time garbage collection, that is, while the application is in progress

Generation collection (Generational Collecting)

Based on the algorithm of object life cycle analysis, objects are divided into young generation, old generation and persistent generation, and different algorithms are used for objects with different life cycle.

The judgment of garbage collection: because the reference counting method can not solve the circular reference, in fact, the collection algorithm starts from the root node, traverses the whole object reference and finds the living object. The search starts with a stack (such as java's Main function) or a runtime register, finds objects in the heap through its representative reference, iterates step by step until it ends with a null reference or primitive type, the result is an object tree, and the collector reclaims objects that are not in the tree.

The concept of generation: because the life cycle of different objects is different, adopting different collection methods according to their own characteristics can greatly improve the recovery efficiency. For example, business-related objects generally have a long life cycle, while temporary variables have a short life cycle. Through generation, objects with a long life cycle can be avoided from being traversed, so as to reduce consumption.

How to divide the generation: virtual machines are divided into the younger generation (Young Generation), the older generation (Old Generation) and the persistent generation (Permanent Generation). All the newly generated objects are first placed in the younger generation, and the goal of this generation is to recycle those objects with a short life cycle as soon as possible, which is divided into three zones, one Eden zone and two Survior zones. Most of the objects are generated in the Eden area. When the area is full, the surviving objects are copied to the Survivor area (one of the two), and when the area is also full, the surviving objects are copied to another Survivor. When the Survivor is also full, the surviving objects copied from the first Survivor area are copied to the old zone Tenured, so the objects with a longer life cycle are mainly stored in the old area. The persistent generation is used to store static files, such as Java classes, methods, and so on. Persistent generation has no significant effect on garbage collection, but when App uses more reflection, you need to increase the size of persistent generation by setting-XX:MaxPermSize=. Next, through a picture, we have a macro understanding of this part.

Trigger of garbage collection algorithm: due to the generational processing of objects, the region and time of garbage collection are also different, including the following two types of GC.

Scavenge GC: usually triggered when a new object is generated and Eden fails to apply for space. The non-living objects in the Eden area will be cleared, the inventory objects will be moved to Survivor, and then the two Survivor areas will be sorted out. This method will not affect the old age, in addition, the GC recommends the use of fast and efficient algorithms to make the Eden area free as soon as possible.

FullGC: clean up the entire heap, including Young, Tenured, and Perm, so you need to reduce the number of FullGC in order to improve system performance. FullGC occurs in scenarios where the older generation is full, the persistent generation is full, and System.gc () is displayed to be called, and the allocation policy of each Heap domain has changed dynamically since the last GC.

Next, a table is used to connect the advantages and disadvantages of different collectors.

Collector name

Interpretation

Serial collection

Use a single thread to handle all garbage collection, simple and efficient, suitable for scenarios with a small amount of data. Open via-XX:+UseSerialGC

Parallel collection

Parallel garbage collection for the younger generation, so you can reduce garbage collection time and turn it on using-XX:+UseParallelGC.

Can be used for parallel collection of old times, default to use single-threaded garbage collection, using-XX:+UseParallelOldGC open

Use-XX:ParallelGCThreads= to set the number of threads for parallel garbage collection, which can be equal to the number of machine processors

Set the maximum interim time for garbage collection through-XX:MaxGCPauseMillis=

Through the ratio of-XX:GCTimeRatio= garbage collection time to non-garbage collection time, then 1 / (1 percent N) is the throughput of the current system, and the N default value is 99, that is, 1% of the time is spent on garbage collection.

Concurrent collection

The first two applications will be obviously paused during garbage collection. This method can reduce this impact, ensure that most of the work is carried out concurrently (the application does not stop), and is suitable for medium and large-scale applications. Use-XX:+UseConcMarkSweepGC to open. Due to the complexity of concurrent collection, we will introduce several basic concepts.

Floating garbage: due to garbage collection while the application is running, all garbage may be generated when garbage collection is completed, resulting in "Floating Garbage". These garbage can only be collected in the next garbage collection cycle, so the concurrent collector needs to reserve 20% of the space for these floating garbage.

Concurrent Mode Failure: because the system is running during garbage collection, you need to ensure that there is enough space for the program to use, otherwise "concurrent mode failure" will occur when the stack is full, and the whole application will be suspended for garbage collection. You can start concurrent collection by setting-XX:CMSInitiatingOccupancyFraction= to specify how much space is left in the domain heap.

A new generation of garbage collection algorithms (Garbage First, G1): this algorithm is prepared for large applications and supports large heaps and high throughput. To put it simply, the algorithm divides the whole heap into equal size regions. The collection and division of memory are based on region, and the garbage collection process is divided into several stages by drawing on the characteristics of CMS. After the G1 scans the region, it sorts the size of the active objects. It will first collect the small region of the active objects in order to quickly reclaim the space. Because the active objects are small, most of them can be considered to be garbage, all of which is called Garbage First, that is, garbage priority collection. The whole garbage collection process consists of the following steps.

Initial tag (Initial Marking): G1 holds two bitmap for tagging for each region, one is previous marking bitmap, and the other next marking bitmap,bitmap contains the address information of a bit pointing to the starting point of the object. Before starting the tag, first concurrently empty the next marking bitmap, then stop all application threads, scan and identify the objects in each region that the root can directly access, put the top value of the region into the next top at mark start (TAMS), and then restore all threads.

Concurrent tags (Concurrent Marking): scan objects according to the previous tags to identify the active status of the underlying objects of these objects, write the prior pass records modified concurrently by threads during this period into remembered set logs, and the newly created objects are placed in a higher address range than the top value, and the default state of these newly created objects is active, while modifying the top value.

Final tag pause (Final Marking Pause): when the application thread's remembered set logs is not full, it will not be put into the filled RS buffers, so you need to process the remembered set logs and modify the corresponding remembered set in this step.

Surviving objects are calculated and cleared (Live Data Counting and Cleanup): the trigger of this step depends on whether the memory space reaches H (H = (1murh) * HeapSize, h is the percentage threshold of JVM Heap size).

There are many configuration items related to JVM. First of all, understand the heap-related configuration through a general configuration.

Java-Xmx3550m-Xms3550m-Xmn2g-Xss128k-XX:NewRatio=4-XX:SurvivorRatio=4-xx:MaxPermSize=64m-XX:MaxTenuringThreshold=0

-Xmx3550: set the maximum available memory of JVM to 3550m

-Xms3550: set the initial memory of JVM to 3550m, which can be consistent with the maximum memory to avoid memory reallocation by JVM after each garbage collection

-Xmn2g: set the age generation size to 2G, and the whole heap size = young generation size + old generation size + persistent generation size. The default size of the persistent generation is 64m, and all increases in the younger generation will reduce the size of the older generation, so this value is very important and is recommended for the entire heap size of 3max 8.

-Xss128k: set the stack size of threads. The default is 1m, which needs to be adjusted according to the application. Generally, the recommended number of threads in OS is 3000-5000.

-XX:NewRatio=4: set the ratio of the young generation to the old generation, that is, the proportion of the parents in the old generation is 4.

-XX:SurvivorRatio=4: sets the ratio of the size of Survivor regions of Eden regions in the younger generation to 4, that is, the ratio of two Survior regions to one Eden region is 2:4.

-XX:MaxPermSize=64m: set persistent generation size to 64m

-XX:MaxTenuringThreshold=0: set the maximum age of garbage. If set to 0, the younger generation will enter the old age without going through the Survivor area, which is suitable for more scenarios in the old age.

The parallel collector with priority of throughput and the concurrent collector with priority of response time are introduced next. Tip: this type of application is recommended to make the younger generation as large as possible, especially for applications with high throughput.

Parallel collector

Java-Xmx3550m-Xms3550-Xmn2g-Xss128k-XX:+UseParallelGC-XX:ParallelGCThreads=20-XX:+UseParallelOldGC-XX:MaxGCPauseMillis=100-XX:UseAdaptiveSizePolicy

-XX:+UseParallelGC: select the younger generation garbage collector as the parallel collector

-XX:ParallelGCThreads=20: sets the number of threads in the parallel collector, preferably the same as the number of processors

-XX:+UseParallelOldGC: configure the old generation garbage collection method to collect in parallel

-XX:MaxGCPauseMillis=100: sets the maximum time for each young generation garbage collection, and if so, automatically adjusts the annual parent size to meet this value.

-XX:+UseAdaptiveSizePolicy: when this option is set, the parallel collector automatically selects the size of the younger generation area and the corresponding Survivor area ratio. It is recommended to keep it open.

Concurrent collector

Java-Xmx3550m-Xms3550-Xmn2g-Xss128k-XX:ParallelGCThreads=20-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:CMSFullGCBeforeCompaction=5-XX:UseCMSCompactAtFullCollection

-XX:+UseConcMarkSweepGC (CMS): sets the older generation to collect concurrently

-XX:+UseParNewGC: set the younger generation for parallel collection, which can be carried out at the same time as CMS collection. The existing version does not need to be configured.

-XX:CMSFullGCBeforeCompaction=5: set the number of times to run GC to compress and organize the memory space

-XX:UseCMSCompactAtFullCollection: turn on the compression of the older generation to eliminate fragmentation but affect performance

In addition, there are some configurations that display GC auxiliary information:-XX:PrintGC,-XX:+PrintGCDetails,-XX:PrintGCTimeStamps, Xloggc:filename.

Java memory model: different platforms, the memory model is different, but the jvm memory model specification is unified, java multi-thread concurrency problems will be reflected in the java memory model, the so-called thread safety is to control the orderly access and modification of a resource by multiple threads. Summing up the memory model of Java, we need to pay attention to two main issues: visibility and ordering.

Tip: this part is difficult to understand and needs more review.

Visibility: multiple threads cannot communicate with each other, and they need to communicate by sharing variables. The Java memory model stipulates that jvm has main memory, and the main memory is shared by multiple threads. When new an object, it is also allocated in the main memory, each thread has its own working memory, and the working memory stores copies of some objects in main memory. When a thread operates on an object, the execution order is as follows: copy the variable current working memory (read and load) from the main; execute the code to change the shared variable value (use and assign); and refresh the main memory related content with the working memory data (store and write). The JVM specification defines the thread's operation instruction on main memory: read,load,use,assign,store,write. When a shared variable has a copy in the working memory of multiple threads, if one thread modifies the shared variable, then other threads should be able to see the modified value, which is the visibility problem of multithreading.

Ordering: the thread cannot reference the variable directly from the main memory when referencing the variable. If the variable is not in the working memory of the thread, it will copy a copy from the main memory to the working memory. This process is read-load, and the thread will reference the copy when it is finished. When the same thread references the field again, it is possible to retrieve a copy of the variable from main memory (read-load-use) or directly to the original copy (use), which means that the order of the read,load,use can be determined by the JVM implementation system. The thread cannot directly assign a value to the field in main memory, it will assign the value to the copy variable (assign) in working memory, and when it is finished, the copy of this variable will be synchronized to the main storage (store-write). As for when to synchronize, it is also up to JVM to decide. For the order of this part of the operation, you need to use the synchronized keyword, you can change the method to the synchronization method public synchronized void add (), or you can add the synchronization variable static Object lock=new Object (), and then synchronized (lock). Each lock object has two queues, one is the ready queue, the other is the blocking queue, the ready queue stores the thread that is about to acquire the lock, and the blocking queue stores the blocked thread. When a thread is nitify, it can enter the ready queue and wait for cpu scheduling. For example, when a thread an executes the account.add method for the first time, jvm checks whether there is a thread waiting in the ready queue of the lock object account. If there is an indication that account is occupied, this is the first time to run, so the account ready queue is empty, so thread an acquires the lock and executes the method. If this happens to be thread b to execute the account.withdraw method, because the lock acquired by thread a has not been released, b enters the account's ready queue and waits for the lock to execute.

To put it simply, a thread executes the critical area code as follows: obtaining the synchronous lock Li Qingsky working memory; copying the variable from the main memory to the working memory; calculating these variables; writing the variable from the working memory back to the main memory; releasing the lock.

Producer-consumer model: this is a very classic thread synchronization model, sometimes not only need to ensure that multiple threads more than one shared resource operations are mutually exclusive, often multiple threads are cooperative, a simple example is shown below.

View Code

Volatile keyword: volatile is a lightweight synchronization tool for java, which only provides visibility into multithreaded memory and does not guarantee orderly execution. The significance is that any thread that modifies a variable modified by volatile will immediately be read by other threads, because no thread synchronizes working memory with main memory because it operates directly on main memory. Its usage scenario is that writing to a variable does not depend on the current value; the variable is not included in an infinitive with other variables.

JVM invocation tools: common ones include Jconsole, JProfile, and VisualVM, and VisualVM is recommended. All tuning comes from the monitoring and analysis of online applications, which mainly need to observe memory release, collection class checking, object tree and so on. As shown in the following figure, analyze by looking at the situation of the collection instance. By viewing this kind of heap information, we can analyze whether the division of the older generation and the younger generation is reasonable, whether the memory is leaked, whether the garbage collection algorithm is appropriate, and so on.

In addition, you can also know the number and status of threads in the system through thread monitoring, whether deadlock, etc.; check the situation of CPU and memory hotspots through the sampler; and understand the differences of related states at different times through snapshots.

Memory leak check: memory leak can generally be understood as the misuse of system resources, resulting in the use of resources can not be recycled, resulting in new resource allocation requests can not be completed, resulting in system errors. The common scenarios are: old heap space is occupied (java.lang.OutOfMemoryError:Java heap space), problems can be found by changes in heap size; persistent generation is full (java.lang.OutOfMemoryError:PermGen space), which occurs when reflection is heavily used; stack overflow (java.lang.StackOverflowError) is usually caused by incorrect recursion and loops. Thread stack full (Fatal:Stack size too small) can be solved by modifying-Xss, but still pay attention to whether it is caused by thread stack too deep; system memory is full (java.lang.OutOfMemoryError:unable to create new native thread), because OS does not have enough resources to generate threads, you can consider reducing the consumption of a single thread or redesigning this part of the program.

common problem

1. The difference between a heap and a stack: a heap holds objects, but temporary variables within objects are stored in stack memory. The stack follows threads, where there are threads, there is stack, the heap follows JVM, and where there is JVM, there is heap memory.

two。 What exactly exists in heap memory: objects, including object variables and object methods.

3. What's the difference between class variables and instance variables: static variables (decorated with static) are class variables, and non-static variables are instance variables. Static variables are stored in the method area and instance variables are stored in heap memory. There is a saying that the class variable is initialized when JVM starts, but it is not true.

Whether the method of 4.Java passes a value or a reference: no, but the address is passed by value, specifically, the value passed by the original data type and the address passed by the reference type. For raw data types, JVM is handled by copying from Method Area or Heap to Stack, then running the methods in Frame, and then copying the variables back after running.

5. Why OutOfMemory is generated: because there is no free space in Heap memory or the permanent area is full, sometimes you will find that this situation still occurs when there are not many objects, which is usually caused by too many inheritance levels, because the objects generated in Heap generate parent classes first, and then subclasses.

6. Why StackOverFlowError is generated: because threads run out of stack space, it is usually caused by recursive functions.

Those that are shared in 7.JVM are private: Heap and Method Area are shared, and the rest are private.

8. There are also additional concepts that need to be noted: constant pool (constant pool), which stores constants in programs sequentially and indexed. By default, 0 to 127 are placed in constant pools, as is string; Security Manager (Security Manager), which provides security control at run time for java, and class loaders can load class files only after they have been authenticated. The method index table (Methods table) records the address information of each method. The address pointers in Stack and Heap actually point to the address of the Methods table.

9. Why you can't call System.gc (): because this operation will Full GC and stop all activities.

What is 10.CGLib: when class is enhanced by technologies such as Spring and Hibernate, it can directly manipulate bytecode to generate Class files dynamically.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.