You call this a pointer? 07/06 Update SLTechnology News&Howtos

You call this a pointer?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

This article comes from the official account of Wechat: low concurrency programming (ID:dibingfa). Author: flash.

The original title: "you call this stupid thing a pointer?" "

This series is divided into three parts, which thoroughly understand the nature of pointers in a stupid way:

You call this stupid thing a pointer-- the basics.

You call this stupid thing a pointer.

You call this stupid thing a pointer-- a pervert.

Say no more, let's go!

Memory, usually carefully drawn as follows, is a lattice building with a low address and a high address above.

But I'm going to paint it in a different way today, and it looks like this.

Each grid represents a byte (8 bits) in memory, and the number on the grid represents the memory address. I also directly expressed it in decimal, so that hexadecimal doesn't understand again.

At present, the memory is completely empty and there is nothing in the grid.

Imagine if you forgot all the syntax rules and programming specifications, how would you describe the operation on these memory squares?

First, the type system is very simple, put a number 29 to grid 3, and put a number 38 to grid 6, so simply and directly describe it.

But it is too troublesome to talk like this, what to put a number 29 to grid 3, too much nonsense, and it is not convenient for unemotional computers to understand.

So let's give an instruction, using mov $x, (y) to put the number x in the grid y, as follows:

Mov $29, (3) mov $38, (6) this means what I just said:

Isn't it too easy to put the number 29 in memory grid 3 and the number 38 in memory grid 6? Don't worry, the fun will begin soon!

What if you want to put the number 999 in memory grid 8?

Because one grid represents 1 byte and only 8 bits, it can only represent 256 digits, either signed-128-127 or unsigned 0-255. Obviously, the number 999 cannot be placed in one grid and can only occupy two squares.

That's easy to say. Put the number 999 in grid 8 and occupy two squares in a row.

But in this way, we just have to change the mov instruction, not only to express the meaning of "storage", but also to show how many squares are occupied.

We use movb to represent only 1 byte and movw to represent 2 bytes. So, the last three numbers can be expressed in instructions like this:

Movb $29, (3) movb $38, (6) movw $999, (8) means:

Put the number 29 into memory grid 3, occupy 1 byte, put the number 38 into memory grid 6, occupy 1 byte, put the number 999 into memory grid 8, occupy 2 bytes OK, since you have 1-byte and 2-byte instructions, you might as well redesign, using movl to represent 4 bytes, movq for 8 bytes.

Movb occupies 1 byte movw occupies 2 bytes movl takes 4 bytes movq takes 8 bytes unwittingly, the type system is designed by you quietly! Of course, although this is only a semi-finished product.

Variables you keep putting data into different grids.

For example, I put my age in grid 11 (1 byte) and my monthly salary in grid 14 (4 bytes).

Now our memory is so chaotic that you can't remember what the data in grid 3 and grid 11 mean. You can only know by looking at the numbers that what is in grid 14 is indeed my monthly salary. What should I do about this?

Add a layer of abstraction! We put a label on these grids with our data, so that we don't have to remember the meaningless grid numbers.

In this way, in fact, we no longer care about which grid these tags are in, just find me a grid and put my data in it.

Movb $29, amovb $38, bmovw $999, cmovb $18, agemovl $2147483647, salary of course, I also need to use this tag to find out the data I just put in.

This is very simple, but there is a problem, when we put it in, we can know how many squares it takes up through movb,movw,movl and so on. When it is taken out, the label does not say how many squares this data occupies, which is problematic.

Therefore, when defining this tag, you can't just take a name, but you also need to have information about how many cells are occupied by the data corresponding to this tag.

Following the example of the storage operation just now, we also specify a series of words to modify these tags to indicate how many squares are occupied.

Char represents 1 byte, short represents 2 bytes, int represents 4 bytes, and long represents 8 bytes.

So the five pieces of data just now can be expressed as the following instructions:

Char a = 29 posichar b = 38 short c = 999 age = 18 salary = 2147483647; come on, I don't want to hide it, I believe you all know that this is how to write in C language, and that pile of short is written in assembly language.

These char a _ r _ int _ int salary and so on are variables! Remember, variables have not only names, but also types!

Third, variable definition and assignment in fact, just written, is the definition of variables and assignment operations written on one line.

For example, there are the following sentences:

Int a = 1; it is actually divided into two steps:

/ / definition of variable int a bang / assignment of variable (here you can also call it initialization of variable) a = 1; where the definition of variable is for programmers to use it later, this part is not for CPU.

The variable assignment is really put the data in memory, this part is really related to the execution of CPU specific instructions.

That is, if you only define a variable int a, but there is no assignment to it, the definition will not be reflected at all when the CPU executes the instruction.

Pointer now, let's clear the memory and go back to the pure land at the beginning.

Let's do something. I store my password (1234) in a short a, assuming that the variable an is placed in grid 6.

At the same time, I store the address of this variable a, which is the number 6, in another variable, int p, assuming that this variable p is placed in grid 1.

In this way, the way I find my password is to find the stored value through the memory address of p, that is, the memory address of a, and then through the memory address of a to find the stored value, that is, the password 1234 I am looking for.

We can use the following code to represent the storage logic just now.

Short a = 1234 / suppose an is placed in grid 6 int p = 6; here p and an are variables, but the variable p is a little special, the value stored in it is a memory address, we vividly call p a pointer variable, or pointer for short.

However, there are a few problems in this way, and I will talk about them one by one.

1. First of all, in the coding stage, we do not know and do not need to know where the variable a will be stored, otherwise we will lose the meaning of the label and return to the era when we need to care about the specific memory address (that is, the grid number).

So, we should have a way to represent the meaning of the address of the variable a during the coding phase, let's call it & a.

Then our code can be optimized to:

Short a = 1234 bind / suppose the address of an is 6max / then the following p equals 6int p = & a; as shown in the figure:

two。 The point of view of the size of the pointer variable itself is placed on this variable p, although in essence this variable p stores a value, assuming 6, but it represents the value of a memory address.

If you let the programmer arbitrarily specify the data type of this variable p (that is, how many bytes it takes), it is obviously prone to problems.

For example, if the memory address is 999, there will be a problem if I use a variable p of type char to store it.

It is impossible to determine the memory address of a variable during the coding phase, so it is impossible to determine what type of variable is used to store it.

Therefore, the safest way is to store pointer variables with a variable type that can fully accommodate all memory address ranges.

Let's just think that we're on a 32-bit system, so we can just store it with a 4-byte variable. (of course, it actually depends on the number of digits of your compiler.)

Now, the amount of memory occupied by our pointer variables is a fixed 4 bytes, that is, 4 squares.

The programmer does not need and cannot modify this size, so we can remove the data type before p.

Short a = 1234 * p = & A * 3. The type of pointer variable has just solved the amount of memory occupied by the pointer variable itself, but there is still a problem to be solved, which is the size of the variable at the address of the memory stored in the pointer variable.

In other words, although the pointer variable p above stores the memory address 6 of variable a, the pointer variable p does not have any information about the size of the variable at memory address 6.

If we think that the variable at memory address 6 is of type char, that is, it takes up only one byte, then obviously, it will fetch a value that does not match the expectation.

Of course, if you think that the variable at 6 is an int type, accounting for 4 bytes, although there may be no problem with the value, it is not quite in line with expectations to some extent (even less so if there is something else in the boxes 8 and 9).

Therefore, it is correct to read the value at this memory address exactly according to the type of the variable itself, that is, the short type.

So how should we express this message? That is to say, the variable p is a pointer, and the type of variable at the memory address stored in the pointer is short.

It's easy. Just say the answer.

Short a = 1234 * p = & a * before the * indicates that the variable p is a pointer type, and the preceding short indicates that the pointer points to the variable at the memory address, which is of type short.

More accurately, of course, the pointer p will read the memory it points to according to a variable of type short, and it doesn't matter what it is there.

Note that this short does not mean that the size of the pointer variable itself is 2 bytes. As we mentioned earlier, the pointer variable itself is a fixed 4-byte size.

But it's always too eloquent to say that, from now on, we'll just say that the variable p is a pointer of type short *.

Visually speaking with the above figure, the blue fill of the variable an on the right indicates that an is of type short, while the dotted box outside indicates that the pointer p "interprets" the value at memory address 6 according to the variable of type short.

The match between the two is the "correct" programming code.

Of course, the "correct" here is for programmers, and CPU doesn't care.

4. We can already get the address of a variable above the value pointed to by the pointer, for example, the address of getting an is:

& a We can also define a pointer variable, such as a pointer variable of type short * p:

Short * p; and we can initialize the pointer variable through the direct assignment operation:

P = & a; of course, the above code can also be written together, that is, the definition of the pointer variable p is written on the same line as initialization:

Short * p = & a; however, we don't have a way to represent the block of memory that the pointer variable p points to.

Let's invent one, for example, if we want to change the value of the block of memory pointed to by p to 999, we can write it like this.

* p = 999; here * means "point", that is, * p does not mean the memory address of p, but points to this memory address by treating the contents of p as a memory address.

Represented by a graph, it is:

So a complete program is:

Definition of short a = 1234 position / pointer short * p position / pointer initialization, that is, the value of the pointer variable itself p = & a position / the value of the memory address pointed to by the pointer variable * p = 999; after execution, the value of a will become 999, or the value in grid 6 and grid 7 will become 999.

5. If the pointer is added or subtracted from a normal variable + 1, for example:

Int a = 1 int b = a + 1; obviously, the value of b should be 2, no doubt.

But what if you add 1 to a pointer variable?

Int a = 1 int * p = & an int * p2 = p + 1; let's assume that the variable an is placed at grid 1.

We don't care what the value of variable an is and where the variable p is placed, we just look at the value of p, which is obviously 1 at the beginning.

(for demonstration purposes, the following figure directly represents the memory address that p points to, rather than the memory address where p itself is located.)

Let's not consider what p + 1 should be. If you were to design the language, how much better would you think p + 1 is?

In my opinion, there are only two more reasonable designs.

First, p + 1 equals 2, which is simply added as a numerical value.

Second, p + 1 equals 5, that is, the size of the data type that spans the memory unit that p points to, that is, a 4-byte int.

Which do you think is more reasonable?

That's obviously the second kind! Otherwise, what's the difference between a pointer variable and an ordinary variable? now that you've designed a pointer variable, you need to make it a little more convenient for programmers. That's why you designed it.

Of course, if you don't accept it, you just want this pointer variable of type int * to really be numerically only + 1, that is, let p be equal to 2. What should I do?

It's very simple, it can be divided into three steps:

The first step is to change the p of type int * to p of type char *.

The second step is p + 1.

The third step is to change the p strength of the char * type to the int * type.

Done! Expressed in code is:

P = (int *) ((char *) p + 1); you will see that this is often used in C language projects.

Of course, your gaudy operation, in CPU's eyes, is simply + 1 for a value at a memory address.

Fifth, the essence of the pointer, let's look at the picture above:

In fact, do not look at the above short * p and short a, this is for programmers and compilers.

In CPU's eyes, there are no such dazzling tags and various interpretations, just that there is a number 6 in grid 0-4 and a number 1234 in grid 6-7, that's all.

Furthermore, there is only the number 6 stored in grid 1 (grid 234 is empty) and the number 12 in grid 6 and 34 in grid 7.

(of course, it actually has to be converted to binary, and then combined with the big end order or the small end order, ha, I'm here to tell you simply and intuitively that CPU doesn't care so much, just put the numbers in a grid.)

Therefore, we often hear from books that we must remember that only addresses can be stored in pointer variables, and do not assign an integer or any other non-address type data to a pointer variable.

This view is very awkward, many books, not only want to explain the nature of the pointer, but also want to explain the points for attention of the pointer, mixed together, so that readers do not understand the nature of the pointer, but also do not know the points for attention of the pointer.

What a struggle!

To tell you the truth, who can remember or understand those points for attention only by reading books without a lot of C language practice. After a lot of C language practice, the pointer has long been integrated into the blood, who will see you talk about the nature of the pointer? So, I think this area is very contradictory.

In fact, pointer variables are essentially the same as ordinary variables:

The ordinary variable, short a, tells the compiler that when I a = 1, you find me a 2-byte memory and fill it with 1.

The pointer variable, writing a short * p, tells the compiler two things:

When I p = xxx, you find me a 4-byte memory (we assume that the size of the pointer itself is fixed to 4 bytes) and fill in the xxx, which is exactly the same as a normal variable.

When I * p = yyy, you find the xxx memory address for me and populate the yyy here according to the short type, which is the size of 2 bytes.

So, who says you can't assign an integer variable to a pointer? I just assign an integer variable xxx to pointer p. When I assign a value, I say it's an integer variable, right?

But when I use it, I * p think of xxx as a memory address, so I go to find the place of memory xxx, so what?

Expressed in code is:

I forcibly assign an integer value of 6 to the pointer variable p, and then * p accesses memory address 6 and modifies the value there:

I can also force an address value to be assigned to a normal variable: int * p = 6 * *

Int a = 1 * int b = & a; then the address of an is stored in the ordinary variable b, and I * b can also access an and modify its value:

* b = 999; of course, if you do write this, the compiler will report an error, but it doesn't matter, we can first strongly convert the ordinary variable b into a pointer variable, and then * it:

* (int *) b = 999; you can also play something more fancy, first & take the address, and then * take the value, although it is useless:

* ((int *) * (& p)) = 999; if the address of an is 6, your gaudy operations, in the end, in the eyes of other people's CPU, is a simple instruction:

Movl $999, (6) just want to put 999 on grid 6!

So, don't think about the pointer as complex and sacred, it just makes it easier for programmers to program and tells the compiler how to compile the final instruction.

You write a * p, which accesses the value of p as a memory address, and adds parentheses at the assembly language level:

P) you write a & a, which fetches the memory address of the variable a, which is the lea instruction at the assembly language level:

Lea a, xxx if you write a * p, that is, it is equivalent to adding three parentheses:

((P) of course, the above are easy to understand pseudo-instructions, which are specifically implemented in the real assembly language. I will talk about it in the following chapters. If you understand the pointer directly from the assembly language, you will find that the pointer is just a tool person.

Sixth, at the end of the writing, we have finished our "you call this stupid thing a pointer-- the basics."

Starting from the initial memory lattice, we gradually deduce the role of type systems and variables, and then lead to pointer variables that are essentially no different from ordinary variables, and finally derive the operations related to pointer variables. show you the essence of the pointer.

Do not remember the knowledge points of this article, focus on the whole process of derivation, to understand what the pointer is trying to solve, what is its rationality, which part of the information is for programmers and compilers to see, and which part of the operation is finally implemented in the CPU instruction, these are the key.

Of course, I will give you a brief summary of the relevant parts of the knowledge points, in fact, to put it simply, just a few things.

Define a pointer:

Int * p; assign or initialize a pointer:

P = & a; modify the contents of the pointer:

* p = 999; the addition and subtraction of the pointer (in fact, it is only valuable to the array discussed later):

P = p + 1; that's all!

Finally, I recommend two websites to you.

One is that the C code can be compiled into assembly code in real time, and you can use it to play with the pointer and do experiments to see what it looks like at the CPU instruction level.

Https://godbolt.org

One is the GNU C manual, which describes the various grammars and functions very clearly, do not use search engines to search bloggers.

Https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html

For example, talk about integer types in the type system:

For example, the definition and initialization of pointers:

I believe that when reading this article, someone must want to ask whether short * p should be written as:

Short * p or

Short* p can go to the above document to find the answer.

OK, this is the end of this article. In the next advanced article, I will talk about secondary pointers, arrays, function pointers, strings, structures, structure arrays and pointers.

Although it is an advanced chapter, I think that the essence of the pointer is the advanced, and the advanced of the pointer is the foundation.

Because if you understand all of the above, the following so-called pointer advanced play can be deduced from the nature of the pointer and the rationality of the language design, and then it just takes time to skillfully use and master it.

Therefore, it is very important to understand today's content!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.