How to understand memory alignment caused by WaitGroup in Go 07/11 Update SLTechnology News&Howtos

How to understand memory alignment caused by WaitGroup in Go

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "how to understand the memory alignment caused by WaitGroup in Go". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

WaitGroup introduction

WaitGroup provides three methods:

Func (wg * WaitGroup) Add (delta int) func (wg * WaitGroup) Done () func (wg * WaitGroup) Wait ()

Add, which is used to set the count value of WaitGroup

Done, which is used to subtract 1 from the count value of WaitGroup, actually calls Add (- 1).

Wait, the goroutine that calls this method will block until the count of WaitGroup changes to 0.

I will not cite an example, there are many on the Internet, let's go straight to the point.

Parsing type noCopy struct {} type WaitGroup struct {/ / A technique used to avoid replication can tell the vet tool that it violates the rule used by replication noCopy noCopy / / a compound value Used to represent the number of waiter, count, semaphore state1 [3] uint32} / / get the address of state and the address of semaphore func (wg * WaitGroup) state () (statep * uint64, semap * uint32) {if uintptr (unsafe.Pointer (& wg.state1))% 8 = = 0 {/ / if the address is 64bit aligned, the first two elements of the array are state The latter element makes the semaphore return (* uint64) (unsafe.Pointer (& wg.state1)), & wg.state1 [2]} else {/ / if the address is 32bit aligned, the last two elements of the array are used to do state, which can be used to do the atomic operation of 64bit. The first element 32bit is used to make the semaphore return (* uint64) (unsafe.Pointer (& wg.state1 [1])), & wg.state1 [0]}}

At the beginning of this, WaitGroup shows off its muscles. Let's take a look at how Daniel writes code and think about how an atomic operation works on different architectural platforms. Before looking at why you do this in the state method, let's take a look at memory alignment.

Memory alignment

We can see the definition of memory alignment:

A memory address an is said to be n-byte aligned when an is a multiple of n bytes (where n is a power of 2).

In short, when CPU accesses memory, it accesses multiple bytes at a time. For example, a 32-bit architecture accesses 4bytes at a time. The processor can only read data from memory with a multiple of address 4. Therefore, when data is stored, the value of the first address is a multiple of 4, which is the so-called memory alignment.

Since I can't find the alignment rules of the GE language, I compared the memory alignment rules of the C language, which can match the Go language, so refer to the following rules first.

Memory alignment follows the following three principles:

The starting address of a structure variable can be divisible by its widest member size.

The offset of each member of the structure from the starting address is divisible by its own size, if not, the byte is added after the previous member

The overall size of the structure can be divisible by the size of the widest member, adding bytes if not

Use the following example to actually manipulate memory alignment:

In the 32-bit architecture, int8 occupies 1 bytebook int32 and 4 bytesintint16 occupies 2bytes.

Type A struct {an int8 b int32 c int16} type B struct {an int8 c int16 b int32} func main () {fmt.Printf ("arrange fields to reduce size:\ n" + "An align:% d, size:% d\ n", unsafe.Alignof (A {}) Unsafe.Sizeof (A {}) fmt.Printf ("arrange fields to reduce size:\ n" + "B align:% d, size:% d\ n", unsafe.Alignof (B {}), unsafe.Sizeof (B {}))} / / output://arrange fields to reduce size://An align: 4, size: 12//arrange fields to reduce size://B align: 4, size: 8

Here is an example of running in a 32-bit architecture:

The default alignment size in a 32-bit architecture is 4bytes.

Suppose that the starting address of an in structure An is 0x0000, which can be divisible by the widest data member size 4bytes (int32), so it takes one byte from 0x0000 to store, that is, 0x00000x0001position b is int32 and occupies 4bytes, so to meet condition 2, you need to padding3 byte after a, starting with 0x0004; c is int16, so 2bytes takes up two bytes from 0x0008, that is, 0x00080x0009; at this time, the space occupied by the whole structure is 0x0000~0x0009 occupies 10 bytes, 10% 4! = 0, which does not meet the third principle, so you need to add two bytes later, that is, the space occupied after the final memory alignment is 0x0000~0x000B, a total of 12 bytes.

Similarly, it is more compact than structure B:

Memory alignment of state method in WaitGroup

Before speaking, it should be noted that noCopy is an empty structure with a size of 0, and there is no need for memory alignment, so you can ignore this field when you look at it.

In WaitGroup, the array of uint32 is used to construct state1 fields, and then different return values are constructed according to the number of digits of the system. I will first talk about how to build waiter numbers, count values and semaphores through the field of sate1.

First, unsafe.Pointer to obtain the address value of state1 and then convert it to uintptr type, and then determine whether the address value is divisible by 8. Here, the address mod 8 is used to determine whether the address is 64-bit alignment.

Because of the existence of memory alignment, the starting position of the WaitGroup structure state1 must be 64-bit aligned in the 64-bit architecture, so the first two elements of state1 are combined into uint64 to represent the statep,state1 and the last element represents the semap on the 64-bit architecture.

So when getting state1 on 64-bit architecture, can the first element represent semap, and the last two elements return into 64-bit?

The answer is, of course, no, because the alignment guarantee of uint32 is that a fixed length of an one-time transaction in the 4bytes CPU 64-bit architecture is 8bytes. If you use the last two elements of state1 to represent a 64-bit word field, CPU needs to read memory twice, which cannot guarantee atomicity.

But in a 32-bit architecture, a word length is 4bytes, and it takes two operations to access 64-bit data to be distributed between two blocks. If it is possible to modify other operations between two operations, atomicity cannot be guaranteed.

Similarly, if the 32-bit architecture wants to operate 8bytes with atomicity, the caller needs to ensure that its data address is 64-bit aligned, otherwise there will be an exception in atomic access. We can see the description:

On ARM, x86-32, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

So in order to ensure 64-bit word alignment, only the first 64-bit word in a variable or open structure, array, and slice value can be considered 64-bit word alignment. However, there will be nesting when using WaitGroup, and there is no guarantee that WaitGroup will always exist on the first field of the structure, so we need to add padding to make it align 64-bit words.

In a 32-bit architecture, when WaitGroup initializes, it allocates memory addresses randomly, so the starting position of WaitGroup structure state1 is not necessarily 64-bit alignment, but may be: uintptr (unsafe.Pointer (& wg.state1))% 8 = 4. If this happens, then you need to use the first element of state1 to do padding, and use the last two elements of state1 to merge into uint64 to represent statep.

Summary

Here is a brief summary, because in order to complete the above article, I really consulted a lot of materials to get such a result. So here is a summary: in the 64-bit architecture, the word length of each operation of CPU is 8bytes, and the compiler will automatically help us initialize the address of the first field of the structure to 64-bit alignment, so on the 64-bit architecture, the first two elements of state1 are merged into uint64 to represent statep,state1 and the last element represents semap.

Then in the 32-bit architecture, when initializing the WaitGroup, the compiler can only guarantee 32-bit alignment, not 64-bit alignment, so use uintptr (unsafe.Pointer (& wg.state1))% 8 to determine whether the state1 memory address is equal to 0 to see whether the state1 memory address is 64-bit alignment. If so, then, just like the 64-bit architecture, the first two elements of state1 are combined into uint64 to represent the last element of statep,state1 to represent semap. Otherwise, the first element of state1 is used as padding, and the last two elements of state1 are merged into uint64 to represent statep.

If I am wrong, welcome to diss me. I think I still have a lot to learn.

Add method func (wg * WaitGroup) Add (delta int) {/ / get the status value statep, semap: = wg.state (). / / High 32bit is the count value v, so move the delta to the left by 32 Add to the count state: = atomic.AddUint64 (statep, uint64 (delta) 32) / / get the value of waiter w: = uint32 (state). / / the task counter cannot be a negative if v

< 0 { panic("sync: negative WaitGroup counter") } // wait不等于0说明已经执行了Wait，此时不容许Add if w != 0 && delta >

0 & & v = = int32 (delta) {panic ("sync: WaitGroup misuse: Add called concurrently with Wait")} / / counter is greater than or no waiter is waiting Directly return if v > 0 | | w = = 0 {return} if * statep! = state {panic ("sync: WaitGroup misuse: Add called concurrently with Wait")} / / at this time Counter must be equal to 0, and waiter must be greater than 0 / / set counter to 0 before releasing the semaphore of the number of waiter * statep = 0 for W! = 0; w semap-{/ / release semaphore, release one at a time, wake up a waitress runtime_Semrelease (semap, false, 0)}}

The add method first calls the state method to get the values of statep and semap. Statep is a value of type uint64, and the high 32-bit is used to record the sum of the delta values passed in by the add method; the low 32-bit is used to indicate the number of goroutine waiting to call the wait method, that is, the number of waiter. As follows:

The add method calls the atomic.AddUint64 method to move the incoming delta 32 bits to the left, that is, to add the counter to the value of delta

Because the counter counter may be negative, so int32 to get the value of the counter, waiter can not be negative, so use uint32 to get

Then there is a series of checks. V cannot be less than zero means that the task counter cannot be negative, otherwise panic;w is not equal to delta, and the value of v equals delta, which means that the wait method executes before the add method, and it will also be panic, because waitgroup does not allow the add method to be called after the Wait method is called

Return directly if v is greater than zero or w equals zero, indicating that there is no need to release waiter at this time, so return directly

* statep! = state to this check, the status can only be waiter greater than zero and counter zero. The add method is not allowed to be called when waiter is greater than zero, and the wait method cannot be called when counter is zero, so here we use the value of state to compare with the address value of memory to see whether the call to add or wait causes state changes. If there is, illegal calls will cause panic.

Finally, reset the statep value to zero and release all waiter

Wait method func (wg * WaitGroup) Wait () {statep, semap: = wg.state ()... For {state: = atomic.LoadUint64 (statep) / / get counter v: = int32 (state > 32) / / get waiter w: = uint32 (state) / / counter is zero, do not need to wait to return if v = 0 {. Return} / / use CAS to add waiter to 1 if atomic.CompareAndSwapUint64 (statep, state, state+1) {. / / suspend and wait for wake-up runtime_Semacquire (semap) / / statep is not zero after wake-up Indicates that WaitGroup is used again. This time panic if * statep! = 0 {panic ("sync: WaitGroup is reused before previous Wait has returned")}... / / return return} directly

The Wait method first also calls the state method to get the status value.

Enter the value of Load statep after entering the for loop, and then get counter and counter, respectively

If the counter is already zero, there is no need to wait to return directly

If counter is not zero, add waiter to 1 using CAS. Since CAS may fail, the for loop will return here again to CAS until it succeeds.

Call runtime_Semacquire to suspend and wait for wake up

* statep! = 0 the statep is not zero after waking up, which means that WaitGroup is reused again, which will cause panic. It is important to note that waitgroup is not meant to be reused, but it cannot be reused before the wait method has finished running.

Summary of waitgroup usage

After looking at the add method and wait method of waitgroup, we find that there are a lot of checks in it. Improper use will lead to panic, so we need to summarize how to use it correctly:

The counter cannot be set to a negative number, otherwise panic; will occur. Note that there are two ways to cause the counter to be negative. One is to pass a negative number when calling Add, and the second is to call the Done method too many times, exceeding the count value of WaitGroup.

When using WaitGroup, be sure to wait for all Add method calls before calling Wait, otherwise it may result in panic

Reuse WaitGroup before wait is finished. WaitGroup can be reused, but you need to wait until the last batch of goroutine calls wait before you can continue to reuse WaitGroup.

This is the end of "how to understand memory alignment caused by WaitGroup in Go". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.