Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the features of ZGC

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what are the characteristics of ZGC". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the characteristics of ZGC".

The goal of ZGC

Garbage collectors are designed with goals, some for higher throughput and some for lower latency.

So let's first look at the goals of ZGC:

You can see that its goal is low latency, ensuring that the maximum pause time is within a few milliseconds, no matter how big your pile is or how many objects are alive.

A heap that can handle 8MB-16TB.

Let's press openjdk's wiki to expand today's content.

Keywords: concurrency, based on Region, tidy up memory, support for NUMA, use of dye pointers, use of read barrier, and STAB for ZGC.

Concurrent

This Concurrent means to execute concurrently with the application thread. ZGC is divided into 10 phases, and only 3 very short phases are STW.

You can see that only the initial tagging, relabeling, and initial transfer phase are STW.

The initial tag is directly reachable by scanning GC Roots, which takes a short time, and re-tagging is generally very short. If you exceed the 1ms, you will enter the concurrent marking phase again and do it again, so the impact is not significant.

The initial transfer phase is also scanning GC Roots is also very short, so it can be considered that ZGC is almost concurrent.

And the reason the pause time does not increase with the heap size and the number of living objects is that STW is almost exclusively related to the GC Roots collection size and has nothing to do with the heap size.

In fact, this is a key place where ZGC exceeds G1. The object transfer of G1 requires STW, so when the heap is large, you need to transfer many objects, and the pause time is long, while ZGC has concurrent transfer.

However, one case of concurrent recycling is that the application thread is still generating new objects when it is reclaimed, so you need to reserve some space for new objects generated at the time of concurrency.

If an object is allocated too quickly and there is not enough memory, Full gc occurs in CMS, and ZGC blocks the application thread.

So pay attention to the time when ZGC triggers.

ZGC has an adaptive algorithm to trigger and a fixed time to trigger, so you can modify the ZGC trigger time according to the actual scenario to prevent thread blocking caused by too late trigger and too fast memory allocation.

Also set ParallelGCThreads and ConcGCThreads, which are the number of threads in STW parallel and the number of threads in the concurrent phase, respectively, to speed up recycling.

However, the number of ConcGCThreads needs to be noted, because this stage is concurrent with the application thread, if the number of threads is too large, it will affect the application thread.

In fact, every stage of ZGC is serial, so in theory, there can be no need for two types of threads, so why these two types of threads?

It's for flexible settings. Divided into two categories, you can tune through configuration to achieve maximum performance.

By the way, the STW of ZGC mentioned above is related to the size of the GC Roots collection, so it will increase the pause time of ZGC if it generates a lot of threads, dynamically loads a lot of ClassLoader, and so on.

This needs to be noted.

Region-based

In order to control memory allocation with finer granularity, ZGC, like G1, divides the heap into many partitions.

There are three types: 2MB, 32MB and X*MB (controlled by the operating system).

The following figure shows the comments in the source code:

The strategy for recycling is to give priority to the collection of residential areas, and to avoid recycling in medium and large regions as far as possible.

Compacting

It is partitioned as well as G1, so it must look like a tag-copy algorithm as a whole, so it will also be sorted out.

Therefore, ZGC does not produce memory fragmentation.

The specific process will be analyzed below.

NUMA-aware

The previous G1 was not supported, but it is also supported in JDK14 G1.

Maybe some students are not familiar with NUMA, so let me explain it first.

In the early days, processors were single-core, because according to Moore's Law, the performance of processors could grow exponentially at regular intervals.

In recent years, this growth rate has gradually slowed down, so many manufacturers have launched dual-core and multi-core computers.

In the early days, CPU went from the front-end bus to the North Bridge to the memory bus before accessing the memory.

This architecture is called SMP (Symmetric Multi-Processor), because any CPU accesses memory at the same speed, regardless of the difference between different memory addresses, so it is also called consistent memory access (Uniform Memory Access, UMA).

As more and more cores were added, the bus and North Bridge gradually became bottlenecks. That was impossible, so I thought of a way.

Integrating CPU and memory into one unit is called non-uniform memory access (Non-Uniform Memory Access,NUMA).

To put it simply, to divide the memory, each CPU accesses its own local memory faster, but accesses other people's remote memory more slowly.

Of course, multiple CPU can enjoy one memory or multiple blocks, as shown in the following figure:

However, because memory is divided into local memory and remote memory, when a module is "hot", it is possible that the local memory is full and the remote memory is idle.

For example, 64g of memory is divided into two, module one uses 31g of memory, while the other module uses 5G of memory, and module one can only use local memory, which leads to the problem of memory imbalance.

If there are policies that do not allow access to remote memory, there will be a situation where SWAP (replacing some of the memory to the hard disk) is generated when there is still a lot of memory.

Even if remote memory access is allowed, there is a big difference between the local memory access rate and the local memory access rate, which is a consideration when using NUMA.

ZGC's support for NUMA is that small partitions are allocated preferentially from local memory, and if local memory is insufficient, then from remote memory.

For medium and large partitions, it is up to the operating system to decide.

The reason for this is that most of the generated objects are small partition objects, so the priority local allocation is faster, and it is not easy to cause memory imbalance.

On the other hand, medium and large partition objects are larger, which may lead to memory imbalance if both are allocated locally.

Using colored pointers

The coloring pointer actually takes several bits from the 64-bit pointer to identify the situation of the object at this time, representing Marked0, Marked1, Remapped, and Finalizable, respectively.

Let's take a look at the comments in the source code, which is very clear and intuitive:

The 42 bits of 0-41 are normal addresses, so ZGC supports maximum 4TB (theoretically 16TB) memory, because only 42 bits are used to represent addresses.

Therefore, ZGC does not support 32-bit pointers or pointer compression.

Then use 42-45 bits as the flag bit, in fact, no matter what the flag bit points to the same object.

This is done through multiple mappings, which is simply that multiple virtual addresses point to the same physical address, but the object address is 0001. It's still 0010. It's still 0100. It all corresponds to the same physical address.

Exactly how these tag bits are used will be explained after the analysis of the recycling process below.

But here's a question: why do you support 4TB? aren't there a lot of useless people?

First of all, X862864 has only 48 address buses, so it can only use 48 bits at most. The instruction set is 64 bits, but it supports 48 bits at the hardware level.

Because basically not many systems support such a large amount of memory, it is not necessary to support 64-bit, so it is supported to 48-bit.

So now the object address is 42 bits, and the dye pointer is 4 bits. Isn't there still 2 bits available?

Yes, it can support 16 TB in theory, but I think 4TB is enough for the time being, so I keep it for the time being, that's all.

Using load barriers

The write barrier is used in both CMS and G1, while the read barrier is used in ZGC.

The write barrier is the AOP when the object reference is assigned, while the read barrier is the AOP when the reference is read.

For example, the process of Object a = obj.foo;, triggers the read barrier.

It is also using the read barrier, ZGC can transfer objects concurrently, while G1 uses the write barrier, so you can only STW when transferring objects.

To put it simply, after the GC thread transfers the object, when the application thread reads the object, it can use the read barrier to determine whether the object has been transferred by the flag on the pointer.

If so, correct the reference of the object. According to the above example, not only a will get the latest reference address, but obj.foo will also be updated so that everything will be normal on the next visit and will not be consumed.

The following figure shows the effect of the read barrier, which is to find a place to write down the forwardingTable when you transfer, and then trigger the reference correction when you read it.

This is also called "self-healing", and not only the assigned reference is up-to-date, but the self-reference is also modified.

Coloring pointer and read barrier are the key points for ZGC to achieve concurrent transfer.

Analysis of ZGC recovery process

The steps of ZGC can be divided into three stages: marking, transfer and relocation.

Tags: Mark all living objects from the root

Transfer: select some active objects to transfer to the new memory space

Relocation: because the object address has changed, the pointer to the old object has to be changed to the new object address.

And all three phases are concurrent.

This is the stage of consciousness, and the specific realization of repositioning is actually mixed in the marking stage.

When marking, if it is found that the old address is still referenced, it will be revised to a new address, and then marked.

To put it simply, the tag is experienced from the first GC, and then the object is transferred, which will not be relocated, but will only record where the object has been moved.

At the beginning of the second GC tag, it is found that the object has been transferred, and then it is found that the reference is still old, then it is relocated, that is, modified to a new reference.

So relocation is a mash-up in the next marking phase.

Let me briefly talk about ten steps.

However, there are some steps that do not affect the overall recycling process, so I will not analyze them any more.

The purpose of this article is not to delve into the details of the ZGC implementation, but to understand the general highlights and simple processes of ZGC.

So if you want to know the details, you can check it out by yourself, or you can take a look at the books recommended at the end of my article.

Initial mark

In fact, you should be very familiar with this stage, CMS, G1 have this stage, this stage is STW, only mark the root directly reachable objects, press into the tag stack.

Of course, there are other actions, such as resetting TLAB, determining whether to clear soft references, and so on, without specific analysis.

Concurrent tagging

It starts to traverse the object graph concurrently based on the initially marked objects, and counts the number of surviving objects for each region.

This concurrent tag actually has a detail, there is only one tag stack, but there are multiple threads of concurrent markup.

In order to reduce the competition between each thread will actually be assigned a different tag to bring execution.

You understand that the tag stack is divided into several pieces, and each thread is responsible for traversing the tag object, just like the segment of 1.7 Hashmap.

There must be some threads marked fast, some marked slowly, then the first idle thread will steal other people's tasks to execute, so as to achieve load balancing.

Did you think of anything when you saw this? Yes, it's ForkJoinPool's job theft mechanism!

Relabeling stage

This phase is STW, because the application thread is still running in the concurrent phase, so the reference to the object will be modified, resulting in a missing tag.

Therefore, a relabeling phase is needed to mark those objects that have missed the mark.

If this phase is executed for too long, it will enter the concurrent marking phase again, because the goal of ZGC is low latency, so any sign of high latency has to be checked.

This stage will also do non-strong root parallel tags, non-strong root refers to: system dictionary, JVMTI, JFR, string table.

Some non-strong roots can be concurrent, some can not, do not do specific analysis.

Non-strong reference concurrent markup and reference concurrent processing

It is the traversal of non-strong roots in the previous step, and then some processing of references on soft references, weak references, and virtual references.

This stage is concurrent.

Reset transfer set

Do you remember the relocation when marking? The forwardingTable mentioned in the write-read barrier is a mapping set, you can understand that key is the address before the object transfer, and value is the address after the object transfer.

However, this mapping set has been used in the marking phase, that is, it has been relocated at the time of tagging, so it is not useful now.

But this mapping set is still needed for a new round of garbage collection.

Therefore, a reset operation is made for the address mapping set of those transfer partitions at this stage.

Reclaim invalid partition

Reclaim invalid virtual memory pages that have been freed from physical memory.

Physical memory is freed even when memory is tight, and partitions cannot be freed if virtual space is freed at the same time, because partitions need to be released after a new round of markup is complete.

So there will be invalid virtual memory pages that will be recycled at this stage.

Select the partition to be recycled

This is the same as G1, because there will be many partitions that can be recycled, and the partitions with more garbage will be filtered as the collection of partitions for this collection.

Initialize the transfer table of the set to be transferred

This step is to initialize the forwardingTable of the partition to be recycled.

Initial transfer

This phase actually starts from the root collection, and if the object is in the transferred partition collection, the object space is allocated in the new partition.

If not in the staging partition collection, mark the object as Remapped.

Note that this phase is STW, and only objects that are directly reachable by the root are transferred.

Concurrent transfer

This phase is very similar to the concurrent marking phase, traversing the objects transferred in the previous step and doing the concurrency transfer.

This step is crucial.

The transfer object of G1 requires STW as a whole, while ZGC achieves concurrent transfer, so the latency is much lower.

At this point, the ten steps are over, and one GC is over.

Students can also be a little confused about the several mark bits of the dyeing pointer, and you will understand it after reading the following.

Marking bits of coloring pointers

To analyze the next few marker bits, M0, M1, Remapped.

Let's first introduce a noun, address view: refers to the marked bits of the address pointer at this time.

For example, if the tag bit is now M0, the view at this time is the M0 view.

The view before garbage collection starts is Remapped.

When entering the mark mark.

Marking thread access discovers that the object address view is Remapped when the pointer is marked as M0, that is, the address view is set to M0, indicating the active object.

If the scan to the object address view is M0, the object is a newly assigned or marked object after the start of the tag, so there is no need to deal with it.

The application thread sets its address view to M0 if it creates a new object, sets it to M0 if the object address view it accesses is Remapped, and recursively marks the object it references.

If you are accessing M0, no action is required.

At the end of the marking phase, ZGC uses an object activity table to store these object addresses, and the active object address view is M0.

In the concurrent transfer phase, the address view is set to Remapped.

That is, if the GC thread accesses the object, the object address view is M0 and exists or is in the active table, it is transferred and the address view is set to Remapped.

If it is in the active table, but the address view is already Remapped, it indicates that it has been transferred and will not be processed.

The application thread creates a new object at this point, and the address view is set to Remapped.

At this point, if the access object is in the active table and the address view is Remapped, it means that it has been transferred and will not be processed.

If the address view is M0, it means that it has not been transferred, it needs to be transferred, and its address view is set to Remapped.

If the accessed object is not in the active table, no processing is done.

What's the use of M1?

M1 is used in the next GC, and the next GC is marked with M1 instead of M0.

Change it again next time.

To put it simply, M1 identifies the active objects in this garbage collection, while M0 is the last time the marked objects were collected, but they were not transferred, and there are no active objects marked in this collection.

In fact, from the above analysis and know, if not transferred, it will stay in M0 this address view.

The next time GC is still marked with M0, it confuses the two objects.

So I got an M1.

At this point, the flag bits of the dyeing pointer should be very clear, and I am using the diagram to indicate it.

Students who are not clear suggest to watch the change of marks a few more times, which is not complicated.

Thank you for your reading, these are the contents of "what are the characteristics of ZGC". After the study of this article, I believe you have a deeper understanding of what the characteristics of ZGC are, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report