In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "case analysis of global variables in C language". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn the "C language global variable example analysis" it!
We know that global variable is a very important knowledge point in C language grammar and semantics. First of all, the meaning of its existence needs to be understood from three different angles.
For programmers, it is a variable (variable) that records the content.
For the compiler / linker, it is a symbol that needs to be parsed (symbol)
For a computer, it may be a piece of memory (memory) with an address.
The second is syntax / semantics:
From a scope point of view, the scope of global variables with the static keyword can only be limited to the file, otherwise it will be extended to the entire module and project.
In terms of lifetime, it is static throughout the run of the program or module (note that it is the cross-unit access and persistent life cycle that make global variables often a breakthrough in a piece of attacked code. It's important to understand this)
In terms of space allocation, the defined and initialized global variable allocates space in the data segment (.data) at compile time, and the defined but uninitialized global variable * * temporary (tentative definition) * * in the .bss segment, the global variable is automatically cleared at compile time, but only the declared global variable can only be counted as a symbol, stored in the compiler's symbol table, and the space is not allocated until the link or runtime is redirected to the corresponding address.
We'll show you what interesting things happen to non-static qualified global variables when compiling / linking and when the program is running, and by the way, you can get a glimpse into the parsing principles of the C compiler / linker. The following example is valid for both ANSI C and GNU C standards, and the author's compilation environment is GCC-4.4.3 under Ubuntu.
The first example is # ifndef _ H_#define _ H_int * foo.c * / # include # include "t.h" struct {char a; int b;} b = {2,4}; int main () Void foo () {printf ("foo:\ t (& a) = 0xx\ n\ t (& b) = 0xx\ n\ tsizeof (b) =% d\ n\ tb.a=%d\ n\ tb.b=%d\ n\ tmain:0xx\ n", & a, & b, sizeof b, b.a, b.b, main);} / * main.c * / # include # include "t.h" int btint cmitt main () {foo () Printf ("main:\ t (& a) = 0xx\ n\ t (& b) = 0xx\ n\ t (& c) = 0xx\ n\ tsize (b) =% d\ n\ tb=%d\ n\ tc=%d\ n", & a, & b, & c, sizeof b, b, c); return 0;}
Makefile is as follows:
Test: main.o foo.o gcc-o test main.o foo.omain.o: main.cfoo.o: foo.cclean: rm * .o test
Operation:
Foo: & a) = 0x0804a024
& b = 0x0804a014
Sizeof (b) = 8
B.a=2
B.b=4
Main:0x080483e4
Main: & a) = 0x0804a024
& b = 0x0804a014
& c) = 0x0804a028
Size (b) = 4
Bread2
Cantilever 0
In this project, we define four global variables, and the header file defines an integer type aPowerMain.c defines two integer types b and c and is not initialized, foo.c defines an initialized structure, and defines a function pointer variable of main.
Because each source file in the C language is compiled separately, T.H is included twice, so int an is defined twice. The variable b and the function pointer variable main are repeatedly defined in the two source files and can actually be seen as the address of the code snippet. But the compiler did not report an error and only gave a warning:
/ usr/bin/ld: Warning: size of symbol 'b'changed from 4 in main.o to 8 in foo.o
The running program found that the size of b in main.c printing is 4 bytes, while foo.c is 8 bytes, because the sizeof keyword is a compile-time resolution, and the definition of b type in the source file is not the same.
But the amazing thing is that in both main.c and foo.c, an and b are the same address, that is, an and b are defined twice, b is still of different type, but there is only one copy in the memory image.
We also see that the value of b in main.c is actually the value of b.a, the first member variable of the structure in foo.c, which confirms the previous inference-- * * even if there are multiple definitions, there is only one initialized copy in memory. * * in addition, c is an independent variable on the sidelines.
Why is this happening? This involves the parsing and linking of multiple-defined global symbols by the C compiler.
In the compilation phase, the compiler implicitly encodes the global symbol information in the symbol table of the retargetable file. Here are the concepts of * * "strong" and "weak" * *-the former refers to defined and initialized variables, such as structure b in foo.c, and the latter refers to undefined or defined but uninitialized variables, such as integers b and c in main.c, and two source files that contain an in the header file. When symbols are defined in multiple ways, the GNU linker (ld) uses the following rules to resolve:
Multiple identical strong symbols are not allowed.
If you have one strong symbol and multiple weak symbols, select the strong symbol.
If there are multiple weak symbols, first decide which one is the largest in size, and if the same size, select the first one in the order of links.
As in the example above, the global variables an and b have duplicate definitions. If we initialize the assignment of b in main.c, then there are two strong symbols that violate rule one, the compiler reports an error.
If rule two is met, only a warning is issued, and the actual runtime decides on the strong symbol in foo.c. The variable an is all weak symbols, so only one is selected (in the order in which the target file is linked).
In fact, this rule is a big pit in the C language, and the compiler's "connivance" at multiple definitions of global variables is likely to modify a variable for no reason, resulting in uncertain behavior of the program. If you don't realize the seriousness of the situation, let me give you another example.
Second example / * foo.c * / # include; struct {int a; int b;} b = {2,4}; int main (); void foo () {printf ("foo:\ t (& b) = 0xx\ n\ tsizeof (b) =% d\ n\ tb.a=%d\ n\ tb.b=%d\ n\ tmain:0xx\ n", & b, sizeof b, b.a, b.b, main) } / * main.c * / # include int bintint c int main () {if (0 = = fork ()) {sleep (1); b = 1 Printf ("child:\ tsleep (1)\ n\ t (& b): 0xx\ n\ t (& c) = 0xx\ n\ tsizeof (b) =% d\ n\ tset baked% d\ n\ tc=%d\ n", & b, & c, sizeof b, b, c); foo ();} else {foo () Printf ("parent:\ t (& b) = 0xx\ n\ t (& c) = 0xx\ n\ tsizeof (b) =% d\ n\ tb=%d\ n\ tc=%d\ n\ twait child...\ n", & b, & c, sizeof b, b, c); wait (- 1) Printf ("parent:\ tchild over\ n\ t (& b) = 0xx\ n\ t (& c) = 0xx\ n\ tsizeof (b) =% d\ n\ tb=%d\ n\ tc=%d\ n", & b, & c, sizeof b, b, c);} return 0;}
The operation is as follows:
Foo: (& b) = 0x0804a020
Sizeof (b) = 8
B.a=2
B.b=4
Main:0x080484c8
Parent: (& b) = 0x0804a020
& c) = 0x0804a034
Sizeof (b) = 4
Bread2
Cantilever 0
Wait child...
Child: sleep (1)
& b): 0x0804a020
& c) = 0x0804a034
Sizeof (b) = 4
Set bust 1
Cantilever 0
Foo: (& b) = 0x0804a020
Sizeof (b) = 8
B.a=1
B.b=4
Main:0x080484c8
Parent: child over
& b = 0x0804a020
& c) = 0x0804a034
Sizeof (b) = 4
Bread2
Cantilever 0
(note that the operation is directly output to stdout printing, the author has redirected. / test output to log, the results found that the execution sequence of printing is inconsistent, so the default output is used. )
This is a multi-process environment. First of all, we can see that the addresses of global variables b and c are still the same (only a logical address, of course), regardless of whether the parent process or child process, main.c or foo.c, and there are still different decisions for different modules of b size.
It is worth noting that we assign the variable b in the child process. From the child process itself, including the foo () call, the value of integer b and structure member b.an is 1, while the value of integer b and structure member b.an in the parent process is still 2, but the logical address they display is still the same.
Personally, I think it can be explained that when fork creates a new process, the child process gets the parent process context "mirror" (including global variables naturally), the virtual address is the same but belongs to different process space, and at this time there is only one copy of the physical address that is actually mapped, so the value of b is the same (all 2).
Then the child process rewrites b, triggering the * * copy on write (copy on write) * * mechanism of the operating system. Only then will two real copies be generated in physical memory and mapped to virtual addresses in different process spaces, but the value of the virtual address itself remains unchanged, which is transparent and hidden to the application.
It is also worth noting that this example compiles without the warning of the first example, that is, the sizeof resolution of variable b, I don't know why, or a bug of GCC?
The third example
This example code is the same as the previous one, except that we link foo.c into a static link library libfoo.a, and only the changes to Makefile are given here.
Test: main.o foo.o ar rcs libfoo.a foo.o gcc-static-o test main.o libfoo.amain.o: main.cfoo.o: foo.cclean: rm-f * .o test
The operation is as follows:
Foo: (& b) = 0x080ca008
Sizeof (b) = 8
B.a=2
B.b=4
Main:0x08048250
Parent: (& b) = 0x080ca008
& c) = 0x080cc084
Sizeof (b) = 4
Bread2
Cantilever 0
Wait child...
Child: sleep (1)
& b): 0x080ca008
& c) = 0x080cc084
Sizeof (b) = 4
Set bust 1
Cantilever 0
Foo: (& b) = 0x080ca008
Sizeof (b) = 8
B.a=1
B.b=4
Main:0x08048250
Parent: child over
& b = 0x080ca008
& c) = 0x080cc084
Sizeof (b) = 4
Bread2
Cantilever 0
There is no difference from this example, except that the address loaded by the global variable has changed after using static links, and the addresses of b and c seem to be further apart. This time, however, the compiler did give a sizeof resolution warning for the variable b.
At this point, some people may scoff at the above example, thinking that it is just a list of some features of the C language, not black.
Some people think that since this is the case, all global variables are either limited by static or initialized by definition at the same time to eliminate weak symbols so that errors can be detected in the compilation time. C language is perfect as long as it is used carefully.
For those who think so, I just want to say, please listen carefully in the dead of night, you are likely to hear Dennis Richie's evil laughter in the dead of night-no, not so much ridicule as a curse.
Fourth example / * foo.c * / # include const struct {int a; int b;} b = {3,3}; int main (); void foo () {b.a = 4; b.b = 4 Printf ("foo:\ t (& b) = 0xx\ n\ tsizeof (b) =% d\ n\ tb.a=%d\ n\ tb.b=%d\ n\ tmain:0xx\ n", & b, sizeof b, b.a, b.b, main);} / * t1.c * / # include int b = 1int main () {int count = 5; while (count-- > 0) {T2 () Foo (); printf ("T1:\ t (& b) = 0xx\ n\ t (& c) = 0xx\ n\ tsizeof (b) =% d\ n\ tb=%d\ n\ tc=%d\ n", & b, & c, sizeof b, b, c); sleep (1);} return 0;} / * t2.c * / # include int bint c Int T2 () {printf ("T2:\ t (& b) = 0xx\ n\ t (& c) = 0xx\ n\ tsizeof (b) =% d\ n\ tb=%d\ n\ tc=%d\ n", & b, & c, sizeof b, b, c); return 0;}
Makefile script:
Export LD_LIBRARY_PATH:=.all: test. / testtest: t1.o t2.o gcc-shared-fPIC-o libfoo.so foo.c gcc-o test t1.o t2.o-L. -lfoot1.o: t1.ct2.o: t2.c.PHONY:cleanclean: rm-f * .o * .so test*
Execution result:
. / test
T2: (& b) = 0x0804a01c
& c) = 0x0804a020
Sizeof (b) = 4
Bust 1
Cymb1
Foo: (& b) = 0x0804a01c
Sizeof (b) = 8
B.a=4
B.b=4
Main:0x08048564
T1: (& b) = 0x0804a01c
& c) = 0x0804a020
Sizeof (b) = 4
Bust 4
Cymbal 4
T2: (& b) = 0x0804a01c
& c) = 0x0804a020
Sizeof (b) = 4
Bust 4
Cymbal 4
Foo: (& b) = 0x0804a01c
Sizeof (b) = 8
B.a=4
B.b=4
Main:0x08048564
T1: (& b) = 0x0804a01c
& c) = 0x0804a020
Sizeof (b) = 4
Bust 4
Cymbal 4
...
In fact, the previous examples are just appetizers, the real pit finally appeared! And this time the compiler reported neither error nor warning, but we did see that b, as a strong symbol in main (), was rewritten, and the side c was also "lying gun".
Sharp-eyed readers found that this time foo.c was loaded as a dynamic link library. When T1 first called T2, libfoo.so was not loaded. Once the foo function was called, b was immediately shot, and the address of c was actually adjacent to b, which made c shot together.
However, the author is unable to explain some of the reasons for this behavior. There is a saying that global variables with strong symbols are continuously distributed in data segments (accordingly, weak symbols are temporarily stored in .bss segments or symbol tables), which may be reported to the compiler development team of GNU.
In addition, the author has tried to precede the definitions of b and c in t1.c with the const qualifier, and the compiler still passes by default, but the program triggers a Segment fault exception when it first calls foo () in main (), and rewrites it with a pointer in foo.c.
It is inferred that GCC enabled a similar operating system write protection mechanism for the address where the const constant is located, but I am not sure whether earlier versions of GCC will cause the const constant to be rewritten and the program will not crash.
As for the volatile keyword for global variables, the self-test seems to have no effect.
In your mind, is the C language the same girl who was "pure", "clean" and "consistent"? Maybe she will secretly put a green hat on you when you are not paying attention, all through global variables, especially in the dynamic link environment, even if all defined as strong symbols are still imperceptible to the compiler.
And some IT "terrorists" often * * package malicious code as global variables into vulnerable operation sequences under root permissions, just like the famous stack overflow attack. One day, when you look foolishly at a program that has undefined behavior but can't locate the reason, please don't forget Uncle Richie's deepest "greetings" from the bottom of the world.
Some people may secretly change the concept, blaming all this on compilers and linkers, thinking that it has nothing to do with the language, but I would like to remind you that it is the behavior of the compiler / linker that underpins the syntax and semantics of the entire language.
In turn, we can think about why C's younger brother C++ introduced the concept of * * "namespace" * *, or you can use other high-level languages to determine whether redefined global variables can be compiled.
At this point, I believe you have a deeper understanding of the "C language global variable instance analysis", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.