Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Undefined behavior Analysis in C language

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "C language undefined behavior analysis". In daily operation, I believe many people have doubts about C language undefined behavior analysis. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "C language undefined behavior analysis"! Next, please follow the editor to study!

He wrote a few lines of code on the whiteboard and asked what the program would output.

# include int main () {int I = 0; int a [] = {10Power20 30}; int r = 1 * a [iota +] + 2 * a [iota +] + 3 * a [iota +]; printf ("% d\ n", r); return 0;}

It looks pretty simple and clear. I explained the priority of operators-suffix operations are calculated before multiplication, multiplication is calculated before addition, and the combination of multiplication and addition is from left to right, so I grab the operation symbol and begin to write the formula.

Int r = 1 * a [ionization +] + 2 * a [ionization +] + 3 * a [ionization +]; / / = a [0] + 2 * a [1] + 3 * a [2]; / / = 10 + 40 + 90; / / = 140

After I proudly wrote down the answer, my colleague responded with a simple "no". After thinking for a few minutes, I was baffled. I don't quite remember the combination order of the suffix operators. In addition, I know that the order does not even change the order in which the values are calculated here, because the association rules are only applied between operators at the same level. But it occurred to me that I should try to calculate this formula according to the rule that suffix operators are evaluated from right to left. It looks pretty simple and clear.

Int r = 1 * a [ionization +] + 2 * a [ionization +] + 3 * a [ionization +]; / / = a [2] + 2 * a [1] + 3 * a [0]; / = 30 + 40 + 30; / / = 100

Once again, my colleague replied that the answer was still wrong. At this point I had to throw in the towel and ask him what the answer was. This short sample code was originally deleted from the larger code snippet he had written. To verify his problem, he compiled and ran the larger code sample, but was surprised to find that the code didn't work as he expected. He cut out the unnecessary steps to get the sample code above, compiled it with gcc 4.7.3, and output a surprising result: "60".

Then I was fascinated. I remember that in C language, the order in which function parameters were evaluated was undefined, so we thought that the suffix operator was calculated in a random order, not from left to right. We are still convinced that suffixes have higher operational priorities than addition and multiplication, so we quickly prove to ourselves that there is no order in which we can calculate iFix +, so that the three array elements add up and multiply to 60.

Now I'm obsessed with it. My idea is to look at the disassembly code for this code and try to find out what actually happened to it. I compiled this sample code with debug symbols (debugging symbols), and after using objdump, I quickly got the annotated x86Secret64 disassembly code.

Disassembly of section .text: 00000000000000: # include int main () {0: 55 push% rbp 1: 48 89 e5 mov% rsp,%rbp 4: 48 83 ec 20 sub $0x20 movl% RSP int I = 0; 8: c7 45 e8 00000000 movl $0x18 (% rbp) int a [] = {10jin20 int 30} F: c7 45 f000a 00 00 movl $0x10 (% rbp) 16: c7 45 f4 14 00 00 movl $0x14 rbp 0xc (% rbp) 1d: c7 45 f8 1e 00 00 movl $0x1e mahjong 0x8 (% rbp) int r = 1 * a [ionization +] + 2 * a [ionization +] + 3 * a 24: 8b 45e8 mov-0x18 (% rbp),% eax 27: 48 98 cltq 29: 8b 5485 f0 mov-0x10 (% rbp,%rax,4),% edx 2d: 8b 45e8 mov-0x18 (% rbp) % eax 30: 48 98 cltq 32: 8b 44 85 f0 mov-0x10 (% rbp,%rax,4),% eax 36: 01 c0 add% eax,%eax 38: 8d 0c 02 lea (% rdx,%rax,1),% ecx 3b: 8b 45e8 mov-0x18 (% rbp) % eax 3e: 48 98 cltq 40: 8b 54 85 f0 mov-0x10 (% rbp,%rax,4),% edx 44: 89 d0 mov% edx,%eax 46: 01 c0 add% eax,%eax 48: 01 d0 add% edx % eax 4a: 01 c8 add% ecx,%eax 4c: 89 45 ec mov% eax,-0x14 (% rbp) 4f: 83 45 e8 01 addl $0x1 0x1 0x18 (% rbp) 53: 83 45 e8 01 addl $0x1 -0x18 (% rbp) printf ("% d\ n", r) 5B: 8b 45 ec mov-0x14 (% rbp),% eax 5e: 89 c6 mov% eax,%esi 60: bf 00 00 00 mov $0x0 return% EDI 65: B8 00 00 00 mov $0x0 return 6a: E8 00 00 00 callq 6f return 0 6f: b8 00 00 00 mov $0x0 leaveq% eax} 74: c9 00 75: c3 retq

The instructions of * * and * * only set up the stack structure, initialize the value of the variable, call the printf function, and return from the main function. So we really only need to care about instructions from 0 × 24 to 0 × 57. That's where the behavior of concern takes place. Let's check a few instructions at a time.

24: 8b 45 e8 mov-0x18 (% rbp),% eax 27: 48 98 cltq 29: 8b 54 85 f0 mov-0x10 (% rbp,%rax,4),% edx

The three instructions of * * are in line with our expectations. First, it loads the value of I (0) into the eax register, extends signed to 64 bits, and then loads a [0] into the edx register. The multiplication of 1 (1 *) here is obviously removed by the compiler optimization, but everything looks fine. The next few instructions start out pretty much the same.

2d: 8b 45e8 mov-0x18 (% rbp),% eax 30: 48 98 cltq 32: 8b 44 85 f0 mov-0x10 (% rbp,%rax,4),% eax 36: 01 c0 add% eax,%eax 38: 8d 0c 02 lea (% rdx,%rax,1),% ecx

* mov instructions load the value of I (still 0) into the eax register, expand to 64 bits with symbols, and then load a [0] into the eax register. An interesting thing happened-- we once again expected iTunes + to have been run before these three instructions, but maybe * These two instructions self-add the value of the eax register, actually perform the operation of 2% a [0], and then add the result to the previous calculation and store it in the ecx register. At this point the instruction has obtained the value of a [0] + 2 * a [0]. Things are starting to look a little strange, but once again, maybe some compiler magic is happening.

3b: 8b 45e8 mov-0x18 (% rbp),% eax 3e: 48 98 cltq 40: 8b 5485 f0 mov-0x10 (% rbp,%rax,4),% edx 44: 89 d0 mov% edx,%eax

Then these instructions begin to look quite familiar. They load the value of I (still 0), extend it signed to 64 bits, load a [0] into the edx register, and then copy the value from edx to eax. Yeah, well, let's take a look at more:

46: 01 c0 add% eax,%eax 48: 01 d0 add% edx,%eax 4a: 01 c8 add% ecx,%eax 4c: 89 45 ec mov% eax,-0x14 (% rbp)

Here, a [0] is added three times, plus the previous calculation results, and then stored in the variable "r". Now the incredible thing-- our variable r now contains a [0] + 2 * a [0] + 3 * a [0]. To be sure, that is the output of the program: "60". But what happened to those suffix operators? They are all in *:

4f: 83 45 e8 01 addl $0x1 addl 0x18 (% rbp) 53: 83 45 e8 01 addl $0x1 Mai 0x18 (% rbp) 57: 83 45 e8 01 addl $0x1 Lay 0x18 (% rbp)

It seems that our compiled version of the code is completely wrong! Why is the suffix operator thrown under * and all tasks have been completed? As my belief in reality diminished, I decided to go straight to the source. No, not the compiler source code-- that's just the implementation-- I picked up the C11 language specification.

This problem lies in the details of the suffix operator. In our case, we performed three suffix increments on the array subscript in a single expression. When the suffix operator is evaluated, it returns the initial value of the variable. Reassigning new values back to variables is a side effect. As a result, that side effect is only defined as being applied only between sequence points. Refer to section 5.1.2.3 of the standard, where the details of the sequence points are defined. But in our example, our expression shows undefined behavior. It all depends on when the compiler executes the rest of the expression relative to the side effect of assigning a new value to the variable.

At this point, the study of "C language undefined behavior analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report