Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of pointer in C language

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Most people do not understand the knowledge points of this article "what is the role of pointers in C language", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article, let's take a look at this "what is the role of pointers in C language" article.

1. The essence of memory

The essence of programming is to better manipulate data, and our data is stored in memory.

Therefore, if you can better understand the model of memory and how C manages memory, you can have an insight into how the program works, so that the programming ability can be climbed to a higher floor.

We really do not think this is empty talk, I dare not write thousands of lines of programs in C all year, and I am very resistant to writing C.

Only by the end of the course was it required to write a subway management system in C and write a red-black tree with more than a thousand lines of code.

Because once thousands of lines, often appear a variety of inexplicable memory errors, accidentally occurred coredump. And there is no way to investigate, can not analyze the reason.

It was not until I had a deeper understanding of memory and pointers that I slowly wrote thousands of lines of projects in C\ C++, and there were few memory problems.

The phrase "the pointer stores the memory address of the variable" should be mentioned in any C-speaking book.

Therefore, to thoroughly understand pointers, we must first understand the storage nature of variables in C, that is, memory.

1.1. Memory addressing

The memory of a computer is a piece of space for storing data, consisting of a series of consecutive storage units, such as the following

Each cell represents a Bit, a bit in the eyes of students majoring in EE is high and low potential, and in the eyes of CS students is 0, 1 two states.

Because one bit can only represent two states, the bosses specify eight bit as a group, named byte.

And take byte as the smallest unit of memory addressing, that is, give each byte a number, which is called the address of memory.

This is equivalent to assigning a house number to every unit and household in the community: 301, 302, 403, 404, 501.

In life, we need to make sure that the house number is unique, so that we can accurately locate the family through the house number.

Similarly, in the computer, we have to make sure that the number given to each byte is unique, so that each number can access the unique byte.

1.2, memory address space

We said above to give each byte a unique number in memory, then the range of this number determines the range of the computer's addressable memory.

All the numbers are called the address space of memory, which is related to whether the computer is 32-bit or 64-bit.

In the early days of Intel 8086 and 8088, CPU only supported 16-bit address space, with registers and address buses of 16 bits, which meant that up to 2 ^ 16 = 64 Kb memory numbers were addressed.

There was obviously not enough memory space, and later, 80286 extended the address bus and address register to 20 bits on the basis of 8086, also known as the A20 address bus.

If you are writing mini os, you also need to use the BIOS interrupt to start the switch of the A20 address bus.

However, today's computers generally start with 32 bits, which means that the addressable memory range is 2 ^ 32 byte = 4GB.

So, if your computer is 32-bit, you can't make full use of memory sticks with more than 4G.

Okay, this is memory and memory addressing.

1.3. The nature of variables

Now that we have memory, we need to consider how the variables such as int and double are stored in cells 0 and 1.

In the C language, we would define variables as follows:

Int a = 999 * * Char c ='c'

When you write down a variable definition, you are actually requesting a piece of space from memory to store your variables.

We all know that the int type accounts for 4 bytes, and the numbers in the computer are represented by complement (those who don't know the complement remember to go to Baidu).

The complement is 0000 0011 1110 0111.

C there are four byte here, so you need four cells to store:

Have you noticed that we put the high bytes at the low address?

Can it be the other way around?

Of course, this leads to the big end and the small end.

The way of putting high-order bytes in the low address of memory like the above is called the big end.

Conversely, the way to put the low-order bytes at the low address of memory is called the small end:

The above only shows how int variables are stored in memory, while float, char and other types are actually the same and need to be converted to complement first.

For multi-byte variable types, you also need to write bytes to the memory unit in turn according to the format of the big end or the small end.

Remember the above two diagrams, this is what all variables in a programming language look like in memory, whether they are int, char, pointers, arrays, structures, objects. It's all kept in memory like this.

Second, what is the pointer? 2.1. Where do you put the variables?

As I said above, defining a variable is actually applying for a piece of memory from the computer to store it.

What if we want to know where the variables are?

You can get the actual address of the variable through the operator &, which is the starting address of the block of memory occupied by the variable.

(PS: in fact, this address is a virtual address, not an address on real physical memory

We can print out this address:

Printf ("x", & a)

It would probably be a string of numbers like this: 0x7ffcad3b8f3c

2.2. Pointer essence

As mentioned above, we can get the memory address of the variable through the & symbol, so how does it indicate that this is an address rather than an ordinary value?

That is, how to represent the concept of address in C language?

Yes, it's the pointer, and you can do this:

Int * pa = & a

What is stored in pa is the address of the variable a, also known as a pointer to a.

Here I would like to talk about a few topics that seem a little boring:

Why do we need pointers? Can't we just use the variable name?

Of course, but the variable name is limited.

What is the nature of a variable name?

Is the symbolization of the variable address, the variable is to make our programming more convenient and human-friendly, but the computer does not know any variable a, it only knows the address and instructions.

So when you look at the compiled assembly code in C, you will find that the variable name disappears and is replaced by a string of abstract addresses.

You can assume that the compiler will automatically maintain a mapping, convert the variable name in our program to the corresponding address of the variable, and then read and write to that address.

That is, there is a mapping table that automatically converts the variable name to an address:

A | 0x7ffcad3b8f3c

C | 0x7ffcad3b8f2c

H | 0x7ffcad3b8f4c

....

Good point!

But I still don't know the need for pointers, so here's the problem. Look at the following code:

Int func (...) {...}; int main () {int a; func (...);}

Suppose I have a need:

It is required to be able to modify the variable an in the main function in the func function, so you can read and write the memory of a directly through the variable name in the main function.

But you can't see an in the func function.

You said that you could pass in the address of a by taking the address symbol:

Int func (int address) {....}; int main () {int a; func (& a);}

In this way, you can get the address of an in func and read and write.

In theory, there is no problem at all, but the problem is:

How can the compiler tell whether you store a value of type int in an int or the address of another variable (that is, a pointer).

If it is entirely up to us programmers to memorize it, complexity will be introduced and some syntax errors cannot be detected by the compiler.

When you use int * to define a pointer variable, it is very clear: this is the address of another int variable.

The compiler can also eliminate some compilation errors through type checking.

This is the necessity of the existence of pointers.

In fact, any language has this requirement, but for the sake of security, many languages put a shackle on the pointer and wrap the pointer into a reference.

Maybe we all accept the pointer naturally when we study, but I hope this wordy explanation will enlighten you to some extent.

In the meantime, I'd like to ask a few questions here:

Since the essence of the pointer is the memory first address of the variable, that is, an integer of type int.

Then why are there all kinds of types?

Such as int pointer, float pointer, does this type affect the information stored in the pointer itself?

When will this type work?

2.3. Dereference

The above question is to lead to the dereference of the pointer.

What is stored in pa is the memory address of the a variable, so how to get the value of a through the address?

This operation is called dereferencing, and you can get the content of the address referred to by a pointer through the operator * in the C language.

For example, * pa can get the value of a.

We say that pointers store the first address of variable memory, so how does the compiler know how many bytes to take from the first address?

This is when the pointer type works, and the compiler determines how many bytes should be taken based on the type of element the pointer refers to.

If it is a pointer of type int, the compiler will generate an instruction to extract four bytes, char will extract only one byte, and so on.

The following is a schematic diagram of pointer memory:

The pa pointer is first of all a variable, and it also occupies a piece of memory in which the first address of the a variable is stored.

When dereferencing, four byte are delimited successively from this first address and then interpreted according to the int type encoding.

2.4. Learn and use flexibly

Although this place is simple, it is the key to a deep understanding of the pointer.

Give two examples to illustrate in detail:

For example:

Float f = 1.0 × short c = * (short*) & f

Can you explain what happened at the memory level for the f variable in the above process?

Or what is the value of c? 1?

In fact, in terms of memory, f hasn't changed anything.

As shown in the figure:

Assuming that this is the bit pattern of f in memory, the process is actually taking out the first two byte of f and interpreting it in the short way, and then assigning a value to c.

The detailed process is as follows:

1. Get the first address of f (short*) & f

two。 Nothing is done in the second step above, and the expression simply says:

"Oh, I think the address f is a variable of type short."

Finally, when dereferencing * (short*) & f, the compiler takes out the first two bytes, interprets it as short encodes, and assigns the interpreted value to the c variable.

There is no change in the bit pattern of f in this process, but only the way these bits are interpreted.

Of course, the final value here is definitely not 1, as for what it is, you can really calculate it.

What about the other way around?

Short c = 1 float f = * (float*) & c

As shown in the figure:

The specific process is the same as above, but the above will certainly not report an error, but not necessarily here.

Why?

(float*) & c will let us take four bytes from the first address of c and then interpret it according to the float encoding.

But c is a short type that takes only two bytes, so it's sure to access the next two bytes, and memory access is out of bounds.

Of course, if you just read it, there is no problem with the probability.

However, there are times when you need to write new values to this area, such as:

* (float*) & c = 1.0

Then coredump, that is, memory access failure, may occur.

In addition, even if it does not coredump, this will destroy the original value of this piece of memory, because it is very likely that this is the memory space of other variables, and if we overwrite other people's content, it will definitely lead to hidden bug.

If you understand the above, you will be more comfortable with pointers.

2.5. Look at a small problem.

At this point, let's look at a question. This is asked by a friend of the C language communication group. This is what he needs:

This is the code he wrote:

He wrote the double into the file and read it out, and then found that the printed value did not match.

And the key point is here:

Char buffer [4];... printf ("% f% x\ n", * buffer, * buffer)

He may think that buffer is a pointer (an array to be exact), dereferencing the pointer should get the value inside, and the value inside he thinks is the four byte read from the file, that is, the previous float variable.

Note that all this is what he thinks. In fact, the compiler will think:

Oh, buffer is a pointer of type char, so I'll just take the first byte out.

Then pass the value of the first byte to the printf function, and the printf function will find that% f requires that you receive a float floating-point number, which automatically converts the value of the first byte to a floating-point number.

This is the whole process.

The key to the mistake is that this student mistakenly believes that any pointer dereference is to get the "value we think" inside. In fact, the compiler does not know that the compiler will only foolishly interpret according to the type of pointer.

So this is changed to:

Printf ("% f% x\ n", * (float*) buffer, * (float*) buffer)

It is equivalent to explicitly telling the compiler:

"the place where buffer points to, I put a float, you explain it to me according to float."

III. Structure and pointer

The structure contains multiple members. How are these members stored in memory?

For example:

Struct fraction {int num; / / integer part int denom; / / decimal part}; struct fraction fp;fp.num = 10 position fp.denom = 2

This is a fixed-point decimal structure that occupies 8 bytes of memory (memory alignment is not considered here), and the two member fields are stored as follows:

Image-20201030214416842

We put 10 in the domain of the structure with the base address offset 0 and 2 in the domain with the offset 4.

Next, let's do something that normal people would never do:

((fraction*) (& fp.denom))-> num = 5; ((fraction*) (& fp.denom))-> denom = 12; printf ("% d\ n", fp.denom); / / how much is output?

How much will the above output be? Think about it for yourself.

Next, I'll analyze what happened in this process:

First, & fp.denom indicates the first address of the denom domain in the structure fp, then takes 8 bytes with this address as the starting address, and treats them as a fraction structure.

In this new structure, the first four bytes become the denom domain, while the denom domain of fp is equivalent to the num domain of the new structure.

Therefore:

((fraction*) & fp.denom)-> num = 5

What actually changes is fp.denom, and

((fraction*) & fp.denom)-> denom = 12

The top four bytes are assigned to 12.

Of course, the result of writing a value to that four-byte memory is unpredictable and may cause the program to crash, because there may be key information about the function call stack frame stored there, or there may be no write permission there.

Many coredump errors in beginners of C language are caused by similar reasons.

So the final output is 5.

Why talk about this seemingly inexplicable code?

The purpose of this paper is to show that the essence of a structure is that a bunch of variables are packaged together, and the domain in the structure is accessed through the starting address of the structure, also known as the base address, and then the offset of the field.

In fact, objects in C++ and Java are also stored in this way, but in order to achieve some object-oriented features, they will add some Head information to the data members, such as C++ 's virtual function table.

In fact, we can completely imitate it in C language.

This is why you always say that the C language is the foundation, you really understand C pointers and memory, and you will quickly understand the object model and memory layout of other languages.

Fourth, multi-level pointer

Speaking of multi-level pointer this thing, I used to be a freshman, at most understand to level 2, no matter how much will really make me dizzy, often will write the wrong code.

If you write me this: int * p can break me down, I guess this is the case with many of my classmates now?

In fact, multi-level pointers are not that complicated, that is, pointers. It's simple.

Today I will introduce you to the nature of multi-level pointers.

First of all, I would like to say that there is no such thing as multi-level pointers, pointers are pointers, and multi-level pointers are just logical concepts for our convenience of expression.

First of all, take a look at the express cabinets in life:

Everyone has used this, Fengnest or supermarket lockers are like this, each grid has a number, we just need to get the number, and then we can find the corresponding grid and take out the contents.

The grid here is the memory unit, the number is the address, and what is placed in the grid corresponds to the content stored in memory.

Suppose I put a book on grid 03 and tell you the number of 03, and you can get the book in it according to 03.

So if I put the book on grid 05, and then put only a small note on grid 03, it says, "the book is on grid 05."

What would you do?

Of course, open grid 03, and then take out the note, according to the above content to open grid 05 to get the book.

The No. 03 grid here is called a pointer because it contains small notes (addresses) pointing to other squares rather than specific books.

Do you get it?

So if I put the book on grid 07, and then put a note on grid 05: "book on box 07", and put a note on box 03, "book on box 05."

The 03 grid here is called the secondary pointer, the 05 grid is called the pointer, and 07 is our commonly used variable.

In turn, N-level pointers can be derived.

So do you understand? The same piece of memory, if it stores the address of another variable, is called a pointer, and what is stored is the actual content, which is called a variable.

Int a _ int * pa = & a _ tint * * ppa = & pa;int * pppa = & ppa

The above code, pa is called the first-level pointer, that is, usually said that the pointer, ppa is the second-level pointer.

The memory diagram is as follows:

No matter how many pointers, there are two core things:

The pointer itself is also a variable, which needs to be stored in memory, and the pointer also has its own address pointer memory stores the address of the variable it points to.

That's why multilevel pointers are a logical concept. In fact, a piece of memory can either put the actual content or the address of other variables. It's as simple as that.

How to interpret the expression int * a?

Int* * a` can be divided into two parts, namely `a` and `* a`. The `* `in the latter `* a` indicates that `a` is a pointer variable, and the preceding `int*` indicates the pointer variable `a.

Only the address of a variable of type int* can be stored.

For secondary pointers or even multilevel pointers, we can split it into two parts.

First of all, no matter how many levels of pointer variable it is, it is first a pointer variable, the pointer variable is a *, and the rest * indicates the address of what type of variable this pointer variable can only hold.

For example, int****a means that the pointer variable a can only hold the address of an int*** variable.

Pointer and array 5.1, one-dimensional array

Array is the basic data structure of C #. A thorough understanding of array and its usage is the basis for the development of efficient applications.

Arrays and pointer representations are closely related and are interchangeable in the appropriate context.

As follows:

Int array [10] = {10,9,8,7}; printf ("% d\ n", * array); / / output 10printf ("% d\ n", array [0]); / / output 10printf ("% d\ n", array [1]); / / output 9printf ("% d\ n", * (array+1)); / / output 9int * pa = array;printf ("% d\ n", * pa) / / output 10printf ("% d\ n", pa [0]); / / output 10printf ("% d\ n", pa [1]); / / output 9printf ("% d\ n", * (pa+1)); / / output 9

In memory, an array is a contiguous piece of memory:

The address of the 0th element is called the first address of the array, and the array name actually points to the first address of the array when we access the array element through array [1] or * (array + 1).

You can actually think of it as address [offset], with address as the starting address and offset as the offset, but note that the offset offset here is not directly added to address, but multiplied by the number of bytes occupied by the array type, that is, address + sizeof (int) * offset.

Students who have studied assembly must be familiar with this way, which is one of the addressing methods in assembly: base address addressing.

After reading the above code, many students may think that pointers and arrays are exactly the same and interchangeable, which is completely wrong.

Although array names can sometimes be used as pointers, array names are not pointers.

The most typical place is in sizeof:

Printf ("% u", sizeof (array)); printf ("% u", sizeof (pa))

The first will output 40, because array contains 10 elements of type int, while the second will output 4, the length of the pointer, on a 32-bit machine.

What causes it?

From the compiler's point of view, variable names and array names are symbols, they are typed, and they are eventually bound to data.

The variable name is used to refer to a piece of data, and the array name is used to refer to a set of data (data collection), all of which are typed in order to infer the length of the data referred to.

Yes, arrays also have types. We can think of int, float, char and so on as basic types, and arrays as slightly more complex types derived from basic types. The type of an array is composed of the type of the element and the length of the array. Sizeof calculates the length according to the type of variable, and the calculation process is at compile time, not when the program is running.

In the process of compilation, the compiler will create a special table to store the variable name and its corresponding data type, address, scope and other information.

Sizeof is an operator, not a function, and the length of the symbol can be queried from this table when using sizeof.

So, here you can query the actual length of the array using sizeof for the array name.

Pa is just a pointer to the int type, and the compiler has no idea whether it points to an integer or a bunch of integers.

Although it points to an array here, the array is just a continuous piece of memory, with no start and end flags, and no additional information to record how long the array is.

So using sizeof with pa can only get the length of the pointer variable itself.

In other words, the compiler does not associate pa with an array, pa is just a pointer variable, and no matter where it points to, sizeof always calculates the number of bytes it occupies.

5.2, two-dimensional array

We should not think that two-dimensional arrays are stored in memory by rows and columns. In fact, regardless of two-dimensional or three-dimensional arrays. It's all compiler syntax candy.

There is no essential difference between storage and an one-dimensional array, for example:

Int array [3] [3] = {{1,2pr 3}, {4,5je 6}, {7,8,9}}; array [1] [1] = 5

You might think that the array array in memory would be like a two-dimensional matrix:

1 2 3

4 5 6

7 8 9

But in fact, it goes like this:

1 2 3 4 5 6 7 8 9

It is no different from an one-dimensional array, it is an one-dimensional linear arrangement.

When we access it like array [1] [1], how does the compiler calculate the address of the element we are actually accessing?

For more generalization, assume that the array definition looks like this:

Int array [n] [m]

Visit: array [a] [b]

Then the address of the accessed element is calculated as follows: array + (m * a + b)

This is the essence of a two-dimensional array in memory, which is actually the same as an one-dimensional array, except that the syntax sugar is packaged into a two-dimensional shape.

Magical void pointer

You must have seen these uses of void:

Void func (); int func1 (void)

In these cases, the void expression means that there is no return value or the parameter is empty.

But for void pointers, they represent general pointers, which can be used to hold references to any data type.

The following example is a void pointer:

Void * ptr

The greatest use of void pointers is to implement generic programming in the C language, because any pointer can be assigned to a void pointer, and the void pointer can be converted back to the original pointer type, and the address that the process pointer actually points to does not change.

For example:

Int num;int * pi = & num; printf ("address of pi:% p\ n", pi); void* pv = pi;pi = (int*) pv; printf ("address of pi:% p\ n", pi)

The value of the output will be the same both times:

It may be rare to convert in this way, but when you write large software or general libraries in C, you can't do without void pointers, which are the cornerstones of C generics, such as the sort function declaration in the std library:

Void qsort (void * base,int nelem,int width,int (* fcmp) (const void *, const void *))

All things about specific element types are replaced by void.

Void can also be used to implement polymorphism in the C language, which is a fun thing.

However, there are also some things to note:

Cannot dereference void pointer

For example:

Int num;void * pv = (void*) & num;*pv = 4; / / error

Why?

Because the essence of dereferencing is that the compiler takes N bytes successively from the memory pointed to by the pointer according to the type of pointer, and then interprets these N bytes according to the type of pointer.

For example, the int * pointer, here N is 4, and then interpret the number according to the int code.

But void, the compiler does not know whether it points to int, double, or a structure, so the compiler cannot dereference void pointers.

The above is the content of this article on "what is the role of pointers in C language". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report