How to understand Go compiler code optimization bug positioning and repair 07/02 Update SLTechnology News&Howtos

How to understand Go compiler code optimization bug positioning and repair

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to understand Go compiler code optimization bug positioning and repair". In daily operation, I believe many people have doubts about how to understand Go compiler code optimization bug positioning and repair problems. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "how to understand Go compiler code optimization bug positioning and repair". Next, please follow the editor to study!

Origin

One day, a friend greeted me in the group, "it's interesting to see that someone has mentioned a compiler bug to Go. I feel quite serious. Would you like to come and have a look?" So I opened issue 40367 [1]. At that time, the latest comment was this [2]:

It was mentioned that the problem could not be reproduced by changing a constant in the body of the loop from 1 to 2, which immediately aroused my interest, so I was going to study it.

The bug code follows the phenomenon as shown in the following figure. Normally, the code should stop after outputting "5 / 6", but in fact it is executed indefinitely, and can only be forcibly terminated or crash after the program touches the unauthorized memory address.

First of all, we need to locate the specific direct cause of the problem. To put it simply, this bug is for-range loop out of bounds, and the loop is supposed to end after the number of loops reaches the length of the array, but the loop in the repeater goes on indefinitely. At first glance, the problem looks like bound check has been optimized, so let's give it a real hammer. There is a convenient website to observe the compilation results of a given program online. I use this site [3] to generate the original recurrence program and the assembly of the non-recurrence program after changing line 6 from + 1 to + 2 for comparison. Irrelevant details aside, it is easy to see that the assembly of the former is indeed one less judgment than the latter, so that the loop cannot be terminated, in line 105 of the second paragraph of code:

Now that the direct cause has been located, we need to find a way to catch up with the compiler to see why there is something wrong with the assembly results. For many students, the process of getting into the compiler to check the problem may be unfamiliar and sounds prohibitive, so how do we troubleshoot this problem?

Background knowledge

Before tracking this specific problem, we need to know some relevant knowledge background.

The general running flow of the Go compiler

To track down the problem with the Go compiler, you first need to understand the general flow of the Go compiler. In fact, the implementation of the compiler of Go is regular, and compared with the established compilers such as GCC/Clang, many optimizations have not been realized. The work of a Go program before generating assembly is roughly divided into these steps:

Syntax parsing. Because the Go language syntax is quite simple, the Go compiler uses a handwritten LALR (1) parser, which has nothing to do with today's bug and skips the details.

Type check. Go is a strongly typed statically typed language. During the compilation period, it will do type checking on assignments, function calls and other processes to determine whether the program is legal or not. In addition, this step will transform some Go-built generic functions into specific type function calls, such as the make function, which will be transformed into a specific makeslice/makemap based on the result of the type check during the type checking phase. This part also has nothing to do with today's bug.

Intermediate code (IR) generation. In order to facilitate cross-platform code generation and compile optimization, modern compilers usually turn the syntax tree into an intermediate code representation, which is usually between syntax tree and platform assembly. Go chooses intermediate code in the form of static single assignment (SSA). This part is more important, which will be described in detail in the next section.

Compiler optimization. After the SSA IR is generated, the compiler will pass the code analysis and rewrite many times based on the IR, and each pass will complete an optimization strategy. It is also worth mentioning that many strength reduction strategies in Go use a DSL description, and then the code generates the actual pass code, but this piece has nothing to do with today's content, students who are interested can come down and have a look. Later in the article, we will locate the specific pass that led to the bug in this article, and see the problematic logic in that pass.

After these steps, the compiler is ready to generate the final platform assembly code.

Static single assignment form

Static single assignment means that in this type of IR, each variable is assigned only once. We will not dwell on the benefits of this form, but use a simple piece of Go code as an example to help you understand the meaning of SSA IR.

Here is a simple example, with the SSA IR corresponding to the Go code on the right. As you can see, the whole code is divided into blocks, with each code block (block) code beginning with bXX, and you can see which block the block will jump to at the end of the indentation. Within block, you can see that each value, including constants, has a separate name. For example, the variable an is assigned twice in lines 4 and 5 of the Go code, corresponding to v7 and v11 in SSA IR.

However, if the code contains statements such as if, and you are not sure which value to use at compile time, how do you express it in SSA IR? There is exactly such code in the example, and you can see the if on the sixth line of the Go code. In fact, there is a special phi operator in SSA IR that is designed for this situation. The meaning of the phi operator is that the return value may be any one of the multiple values of the parameter, but the exact value depends on which block the block is running from. In the figure above, you can see that b2 has a phi operator. V22 may be equal to v11 or v21. Which value depends on whether the last block of b2 is b1 or b3, which actually corresponds to the if condition. Of course, if is obviously true in this example, but the SSA IR we see here is an unoptimized IR, which will be optimized during the actual compilation process.

The Go compiler provides a very convenient function to view the SSA IR before and after each optimization pass. You only need to add an GOSSAFUNC=xxx environment variable at compile time, and xxx is the name of the function you want to analyze, because the optimization within the Go compiler is at the function level. For example, in the example above, just run GOSSAFUNC=main go build ssaexample.go and the compiler will output the SSA IR result to the ssa.html of the current directory and open it with a browser.

Investigation process

The optimization strategy for tracking down the problem

With so much pre-knowledge, we can finally trace the cause of this specific bug. The first step is to go through the SSA IR from the Go compiler dump to see which pass has gone wrong. In the manner described in the previous section, we can look at all the SSA IR of the repeating program in issue. Because the Go compiler has a lot of optimized pass, so there are a large number of SSA IR recorded in ssa.html, how can we find the problematic pass? Personally, since I knew something before, I could roughly guess that this problem is prove pass's bug. But even if you do not have the relevant background, since we already know that the direct cause of this bug is the lack of a comparative judgment, we can also use dichotomy to see which pass is missing a comparison instruction to locate. It is important to note that you may navigate to generic deadcode pass because there is a Less64 instruction missing in this pass, as shown in the figure (I use Go 1.15rc1 here, the specific output may be different depending on the compiler version), and on the right is generic deadcode pass:

You can see that the Less64 in b4 on the right disappears compared to the left, and if you look at the parameter of this Less64, v11 is the constant 6, that is, the length of the array in the code, you can be sure that this instruction is the missing boundary judgment. So can we be sure that the bug is from generic deadcode pass? Not really. Because this pass only removes the part of the previous pass that has become dead code, in fact, this line of Less64 has become dead code, as can be seen from the light gray of the instruction on the left, that is to say, generic deadcode pass is actually carrying the pot. However, from here on, it is much easier to find out which pass became the dead code. Just click on this command in the browser to highlight the change of this instruction. If you go ahead with a few pass, it is easy to see that there is something wrong with prove pass:

On the right is prove pass, and you can see that this line turns gray in prove pass.

Introduction to prove pass

The strategy that identified the problem is prove pass, so next we need to see what prove pass is for. In fact, the function of prove pass is to infer the range of SSA values in the global, so that many unnecessary branch judgments can be eliminated. Does it sound like it has something to do with today's bug? In fact, this is a very important pass in the Go compiler, and many optimizations depend on the result of this pass. For example, because Go is a memory-safe language, all slice fetch elements need to do a check to determine whether the subscript used to fetch the element is outside the scope of the slice. This operation is called bound check. But in fact, a lot of code can determine whether this subscript is out of bounds at compile time, so we can eliminate the need to check bound check at run time. This step of optimization is called bound check elimination. For example, the following code example is taken from the Go standard library [4]:

Func (bigEndian) PutUint64 (b [] byte, v uint64) {_ = b [7] / / early bounds check to guarantee safety of writes below b [0] = byte (v > > 56) b [1] = byte (v > 48) b [2] = byte (v > > 40) b [3] = byte (v > 32) b [4] = byte (v > 24) b [5] = byte (v > > 16) b [6] = byte (v > 8) b [7] = byte (v)}

As you can see, this function first carries on the operation of b [7], so that the compiler can learn in prove pass that when the program runs to the third line and later, the length of slice b must be greater than or equal to 7, so the bound check of subsequent operations can be dropped by eliminate. However, prove pass will not only optimize bound check elimination for a specific pattern, but many other pattern will also be optimized in prove pass. So what went wrong with today's bug in prove pass?

Troubleshooting prove pass issu

When it comes to the positioning of code problems, it may be divided into three schools. The first is to log, by adding information to the log to locate the problem; the second is to troubleshoot the problem through gdb and other debugger breakpoints, single-step operation; the third is dynamic tracking, through means such as perf/systemtap/ebpf to dynamically observe the behavior of the program running. Specific to the Go compiler here, in fact, the development of Go compiler Go team cattle also need daily troubleshooting, but also no more than these means, but in the compilation optimization problem they prefer the first way to hit the log, so they have pre-buried a lot of debug logs in each pass, but these logs usually do not open, need special compilation switch. Since prove pass is quite complex, we might as well further narrow the scope of troubleshooting by checking the log. Prove pass's debug log switch is-d=ssa/prove/debug=1, where the larger the number of debug followed by the more detailed the log, we only need to execute go tool compile-d=ssa/prove/debug=1 bug.go at compile time to see the corresponding log. Specific to this bug, you can see the comparison with the level of debug=1. As shown in the following figure, the log of the repeating program is on the left, and the log of the program that does not reappear after modifying the constant on the right:

You can clearly see that the bug program obviously proves more than one relationship. Further, through this log keyword in the grep compiler code, you can find that only findIndVar and addLocalInductiveFacts will type this log. Combined with the context and related comments, it is not difficult to see that the problem actually lies in the addLocalInductiveFacts function. What exactly is the function of addLocalInductiveFacts? As you can easily see from the comments, the function here is to match to a special code pattern, that is, repeat until-like logic, to determine whether a condition is true at the end of the loop. Exactly where the bug in this function comes out, we need to further use a higher level of debug=3 to see the details of its operation:

I only intercepted the relevant log here. As you can see, before the problem of induction, it is proved that v10 > = v16 is not valid. Combined with addLocalInductiveFacts, we can find that in fact, the compiler regards v10 and v16 as the upper and lower bounds of loop variables, that is, min and max variables in the code. However, combined with SSA IR, it is not difficult to see that v16 is not the upper bound of cyclic variables at all, so what is the problem?

Reading the relevant code [5] of extracting max from addLocalInductiveFacts (such as the snippet above), we can see that the intention here is to start from the block where the phi operation of the loop header is located after the conditional judgment is over, all the way back to find the block (if block) of the conditional judgment, and then, as in line 1104 in the code, to determine whether the phi operation is the conditional establishment branching logic or else logic, and whether the condition should be reversed according to the branch. Because if it is else branching logic, then it means that the result of conditional judgment is false, and we need to reverse the condition in order to get the really valid logical condition. Seeing the code here, I believe you already know the root cause of this bug. 1104-1113 lines of code make it clear that if it is a conditional branch, then br is positive, and if it is an else branch, then br is negative. However, there is no indirect correlation between the phi operation and the if block. If the phi operation is not directly related to the if block, then even if we go back to if block, there is no way to know whether the br variable is positive or negative, and the value is unknown. However, in the follow-up logic, we do not judge unknown, but directly follow the positive process by default; just in this bug recurrence program, the block where the phi operation is located is indirectly related to the else branch of if block, so there is a problem naturally when we take the positive process.

The picture above is the ssa cfg diagram of the problem reproduction code. We can clearly see that b6 is not directly related to the corresponding b5, but indirectly related, hitting the wrong path of the code.

End

The problem has been located, so how to fix it? A very simple way is to add a unknown judgment logic directly after the logic of br evaluation, and exit the judgment directly when br = = unknown. In this way, prove pass will obviously become conservative, but it is guaranteed to be correct. After adding this check, the bug playback program runs normally, but as a more general fix, we add a judgment on the entry block at the entrance of the function to ensure that the entry block is indeed a loop opening block, and nothing else happens to match the current pattern. I submitted this fix to the upstream. Because the bug was very serious, and the fix had little impact on the performance test, it was quickly incorporated into master, or commit 7f8608047644ca34bad1728d5e2dbef041a1b3f2 [6], and was about to cherry pick into the first two major versions 1.13 and 1.14, which were still committed to maintenance. As mentioned earlier, this patch will make the optimizer more conservative, so I will make other changes to restore the optimizer to the previous level. I have also submitted the corresponding patch, but due to the 1.15 development cycle has been frozen, it is expected to incorporate master at 1.16 cycle.

At this point, the study on "how to understand Go compiler code to optimize bug positioning and repair" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.