An example Analysis of the principle of linux operating system 07/04 Update SLTechnology News&Howtos

An example Analysis of the principle of linux operating system

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of the principle of the linux operating system. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Linux operating system principle text version

one。 The four ages experienced by computers

1. The first generation:

Vacuum tube computer, input and output: punched cards are very inconvenient for computers. It may take more than a dozen people to do something. The year is about 1945-1955. And it consumes a lot of power. If you had a computer at home at that time, your light bulb would be dimmed as soon as you turned on the computer. Haha.

two。 The second generation:

Transistor computers, batch processing (serial mode operation) systems appear. It saves much more electricity than the first one. The typical representative is Mainframe. The year is about: 1955-1965. In that era: Fortran language was born ~ a very old computer language.

3. The third generation:

The emergence of integrated circuits, multiprocessor (parallel mode operation) design, a more typical representative is: time-sharing system (the operation of CPU is divided into time slices). The year is about 1965-1980.

4. The fourth generation:

The PC appeared, probably from around 1980. I believe that the typical representatives of this era: bill Gates, Jobs.

two。 The working system of computer

Although the computer has gone through four times of evolution, up to now, the working system of the computer is still relatively simple. Generally speaking, our computer has five basic components.

1.MMU (memory control unit for memory paging [memory page])

The computing mechanism is independent of CPU (Computing Control Unit), and there is a unique chip called MMU in CPU. It is used to calculate the correspondence between the line address and the physical address of the process. It is also used for access protection, that is, if a process first accesses a memory address that is not it, it will be denied!

two。 Memory (memory)

3. Display device (VGA interface, monitor, etc.) [belongs to IO device]

4. Input device (keyboard, keyboard device) [belongs to IO device]

5. Hard disk device (Hard dish control, hard disk controller or adapter) [belonging to IO devices]

Expand a little knowledge:

These hardware devices are linked on a bus, and they exchange data through this line, and the leader in it is CPU, who has the highest command. So how does it work?

a. Fetch unit (get instructions from memory)

b. Decoding unit (completing decoding [converting the data fetched in memory into instructions that CPU can actually run])

c. Execution unit (starts to execute instructions and invokes different hardware to work according to the requirements of the instructions. )

We know from the above that MMU is part of CPU, but is there any other parts for CPU? Of course there are, such as instruction register chip, instruction counter chip, stack pointer.

Instruction register chip: the place where CPU is used to take data out of memory and store them.

Instruction counter chip: in order to record the location of the last data fetched in memory by CPU, it is convenient to take values next time.

Stack pointer: each time CPU fetches an instruction, it points the stack pointer to the location of the next instruction in memory.

Their working cycle is as fast as CPU, and the working frequency is in the same clock cycle as CPU, so his performance is very good, completing data communication on the CPU internal bus. Instruction register chip, instruction counter chip, stack pointer. These devices are often called CPU registers.

The register is actually used to save the site. Especially in time multiplexing. For example, when CPU is to be shared by multiple programs, CPU often terminates or suspends a process, and the operating system must save its running state (so that CPU can continue to work in the last state when it comes back to deal with it later. And then continue to run other processes (this is called computer context switching).

three。 The storage system of computers.

1. Symmetrical multiprocessor SMP

CPU in addition to MMU and registers (close to the cpu work cycle), and so on, there is a cpu core, it is dedicated to processing data, a CPU has multiple cores, can be used to run your code in parallel. Many companies in the industry use multiple CPU, which is called symmetrical multiprocessor.

two。 Principle of program locality

Spatial locality:

Programs are made up of instructions and data. Spatial locality means that after a data is accessed, other data that is very close to that data may then be accessed.

Time locality:

Generally speaking, when a program is executed, it may be accessed soon. The same principle applies to data. If a data is accessed, it is likely to be accessed again.

It is precisely because of the existence of program locality that we generally need to cache data, whether from the perspective of spatial locality or time locality.

Expand a little knowledge:

As the storage space of the registers inside CPU is limited, memory is used to store data, but because CPU and speed and memory speed are not at all on the same level, most of the data are waiting for processing (CPU to take a data in memory, cpu circle time can be finished, memory may need to turn 20 laps). In order to make it more efficient, the concept of caching came into being.

Now that we know the locality principle of the program, we know that in order to get more space, CPU actually uses time to exchange space, but cache can directly let cpu get data, saving time, so cache is to trade space for time.

3. Even into the storage system

Friends who work during working hours may have seen tape drives, but now they are basically OUT, and many enterprises use hard drives instead of tape drives, so we are here from the structure of our most familiar home computer, the data saved to the last storage is different. We can give a simple example, their weekly storage cycle is very different. Particularly obvious is the mechanical hard disk and memory, the two access familiarity gap is quite large.

Expand a little knowledge:

Compared to your desktop or laptop at home, you may have taken it apart and talked about mechanical hard drives, solid state drives, memory, and so on. But maybe you haven't seen the cache physical device, but it's actually on CPU. So there may be some blind spots in our understanding of it.

Let's start with first-level cache and headset cache. Their CPU basically does not have much time period when collecting data here, because both first-level cache and second-level cache are in the internal resources of the CPU core. (other hardware conditions are the same. The market price of the first-level cache 128k may be about 300RMB, the first-level cache 256k may buy about 600RMB, and the market price of the first-level cache 512k may be more than four digits this specific price can be referred to JD.com. This is enough to show that the cost of caching is very high! ) at this time, you may ask which level 3 cache? In fact, the third-level cache is the space shared by multiple CPU. Of course, multiple cpu also share memory.

4. Inconsistent memory access (NUMA)

We know that when multiple cpu share tertiary cache or memory, they will have a problem, that is, resource requisition. We know that a variable or string is stored in memory with a memory address. How do they get the memory address? We can refer to the following figure:

Yes, these hardware players split the three-tier cache and let different CPU occupy different memory addresses, so we can understand that they all have their own three-tier cache area and there is no problem of resource grabbing, but it is important to note that they are still the same three-tier cache. It is as if Beijing has Chaoyang District, Fengtai District, Daxing District, Haidian District and so on, but they all belong to Beijing. We can understand it here. This is NUMA, which is characterized by non-uniform memory access and has its own memory space.

Expand a little knowledge:

So the question is, based on the result of the reload, if the process that cpu1 is running is suspended, its address is recorded in its own cache address, but what does it do when cpu2 gets it by CPU2 when it runs the program again?

There is no way to do this. You can only copy an address from the three-level bold area of CPU1 or move it to CPU2 to deal with it. It will take a certain amount of time. Therefore, re-load balancing will lead to a decline in CPU performance. At this point, we can use the process binding to do this, so that when we process the process again, we can still use the CPU we handled before. That is, the affinity of the CPU of the process.

5. Write-through and write-back mechanisms in the cache.

CPU in the processing of data is modified in the register, when the register is not looking for data, it will go to the first-level cache to find, if there is no data in the first-level cache will go to the second-level cache to find, look in turn until found from the disk, and then loaded into the register. When the level 3 cache fetches data from memory and finds that the level 3 cache is insufficient, it automatically cleans up the level 3 cache space.

We know that the final location of the data is the hard disk, and this access process is completed by the operating system. While we, CPU, deal with data, we write data to different places in two ways, namely, write-through (write to memory) and write-back (write-back to first-level cache). Obviously, the performance of writeback is good, but it will be embarrassing if there is a power outage, and the data will be lost, because it will be written directly to the first-level cache, but the first-level cache cannot be accessed by other CPU, so it will be more reliable from the point of view of reliability. It is up to you to decide which way to use according to your needs.

IV. Io equipment

The 1.IO device consists of the device controller and the device itself.

Device controller: a chip integrated on the motherboard lives a group of chips. Responsible for receiving commands from the operating system and completing the execution of the commands. For example, responsible for reading data from the operating system.

The device itself: it has its own interface, but the interface of the device itself is not available, it is only a physical interface. Such as IDE interface.

Expand a little knowledge:

Each controller has a small number of registers for communication (several to dozens). This register is integrated directly into the device controller. For example, a minimized disk controller is also used to specify registers for disk address, sector count, read-write direction, and other related operation requests. So whenever you want to activate the controller, the device driver receives the operation instruction from the operating system, then converts it into the basic operation of the corresponding device, and places the operation request in the register to complete the operation. Each register is represented as an IO port. All the combinations of registers are called the device's Icano address space, also known as the Icano port space.

two。 Driver program

The real hardware operation is done by the driver operation. The driver should usually be completed by the device production, usually the driver is located in the kernel, although the driver can run outside the kernel, but few people play this way, because it is too inefficient!

3. Implement input and output

The device's Icano port cannot be assigned beforehand, because the model of each motherboard is different, so we need to specify it dynamically. When the computer is powered on, every IO device has to register with the I _ Unio port space of the bus to use the I _ Unip O port. This dynamic port is composed of all the registers into the device's I _ max O address space, which has 2 ^ 16 ports, that is, 65535 ports.

As shown in the figure above, if our CPU wants to deal with a specified device, it needs to pass the instructions to the driver, and then drive the instructions that speak CPU into signals that the device can understand and put them in registers (also known as sockets, socket). So the register (I _ hand O port) is the address where the CPU communicates with the device through the bus (I hand O port).

Expand a little knowledge:

There are three ways to realize the input and output of the Istroke O device:

a.。 Polling:

It usually means that the user program initiates a system call, which is translated by the kernel into a procedure call corresponding to the kernel driver, and then the device driver starts IWeiO and checks the device continuously in a continuous loop to see if the device has finished its work. This is a bit like busy waiting (that is, cpu uses a fixed cycle of continuous traversing to see if there is any data on each IWeiO device, which is obviously not ideal. ),

b.。 Interrupt:

Interrupts the program that CPU is processing, interrupts what CPU is doing, and notifies the kernel to get the interrupt request. There is usually a unique device on our motherboard called a programmable interrupt controller. The interrupt controller can communicate directly with the CPU through a certain pin, and can deflect a certain position from the CPU, thus letting the CPU know that a signal is arriving. There will be an interrupt vector on the interrupt controller (this number is usually unique if we want the interrupt controller to register an interrupt number when each of our Iripple O devices is started. Usually, each pin of the interrupt vector can recognize multiple interruptions, which can also be called interruptions.

So when the device is really interrupted, the device will not put the data directly on the bus, the device will immediately send an interrupt request to the interrupt controller, and the interrupt controller will identify which device sent the request by the interrupt vector, and then notify the CPU in some way, let CPU know which device has arrived. At this time, CPU can use the I / O port number according to the device registration, so that it can get the data of the device. Note that CPU cannot fetch data directly, because it only receives the interrupt signal. It can only notify the kernel to run on the CPU, and the kernel will get the interrupt request. For example, a network card receives a request from a foreign IP, and the network card also has its own cache. CPU says that the cache in the network card is taken to memory to read, first determine whether it is your own IP. If so, you will start to open the message, and finally get a port number, and then CPIU will find the port in its own interrupt controller and deal with it accordingly.

Kernel interrupt processing is divided into two steps: the upper part of the interrupt (processed immediately) and the lower part of the interrupt (not necessarily). Take receiving data from the Nic as an example. When a user requests to reach the Nic, CPU will order that the data from the Nic cache be directly sent to the memory, that is, the data will be processed immediately after receiving the data. (the processing here is to read the Nic data into memory without further processing to facilitate later processing. ), which we call the upper part of the interrupt, and the one that actually processes the request later is called the lower part

C.DMA:

Direct memory access, we all know that the data transmission is realized on the bus, CPU is the user of the control bus, at a certain time which Icano device uses the bus is determined by the controller of CPU. The bus has three functions: the address bus (to address the device), the control bus (to control the address of each device to use the bus function) and the data bus (to realize data transmission).

Usually, it is an intelligent control chip (we call it the direct memory access controller) that comes with the Icano device. When it needs to deal with the upper part of the interrupt, CPU will tell the DMA device that the bus is then used by the DMA device, and that it can use the memory space to read the data of the Icando device into the memory space. When the data is read by the DMA's IWeiO device, a message is sent to CPU and the read operation is completed. At this time, CPU informs the kernel that the data has been loaded, and the processing of the second half of the interrupt will be handed over to the kernel for processing. Nowadays, most devices use DMA controllers, such as network cards, hard drives and so on.

five。 Operating system concept

Through the above study, we know that the computer has five basic components. The main purpose of the operating system is to abstract these five components into a more intuitive interface, which is directly used by upper-level programmers or users. What is actually abstracted from the operating system?

1.CPU (time slice)

In the operating system, CPU is abstracted into time slices, and then the program is abstracted into processes, and the program is run by allocating time slices. CPU has an addressing unit to identify the collective memory address where the variable is stored in memory.

The internal bus of our host depends on the bit width of CPU (also called word length), such as the address bus of 32bit, which can represent 2 to the power of 32 memory addresses, and the conversion to decimal is 4G memory space. At this time, you should understand why only 4G memory can be recognized in 32-bit operating systems. Even if your physical memory is 16G, 4G is still available, so if you find that your operating system can recognize more than 4G memory addresses, then your operating system must not be 32-bit!

two。 Memory (memory)

In the operating system, the implementation of memory is realized through virtual address space.

3.I/O equipment

In the operating system, the most core Igamot O device is the disk, which is known to provide storage space and is abstracted into a file in the kernel.

4. Process

To put it bluntly, isn't the main purpose of computers to run programs? When the program runs, we all call it the process (we don't have to worry about threads for the time being). Then if multiple processes are running at the same time, it means allocating these limited abstract resources (cpu,memory, etc.) to multiple processes. We collectively refer to these abstract resources as resource sets.

The resource set includes:

1 > .cpu time

2 >. Memory address: abstracted into a virtual address space (such as a 32-bit operating system that supports 4G space, the kernel takes up 1G space, and the process defaults to having 3G available. In fact, it may not have 3G space, because your computer may have less than 4G memory.)

3 > I / O: everything is a file open multiple files, through the fd (file descriptor, file descriptor) to open the specified file. We divide files into three categories: normal files, equipment files, and pipe files.

Each process has its own job address structure, that is, task struct. It is a data structure maintained by the kernel for each process (a data structure is used to hold data, to put it bluntly, memory space, recording the set of resources owned by the process and, of course, its parent process. save the site [for process switching], memory mapping wait). Task struct simulates linear addresses and lets processes use them, but it records the mapping of linear addresses to physical memory addresses.

5. Memory Mapping-Page frame

As long as it is not the physical memory space used by the kernel, we call it user space. The kernel cuts the physical memory of user space into fixed-size page frames (page frame), which is, in other words, a fixed-size storage unit, which is larger than the default single storage unit (default is one byte, that is, 8bit). Usually one storage unit per 4k. Each page frame is allocated outward as a separate unit, and each page frame has its number. [for example: suppose there is 4G space available, and each page frame is 4K, with a total of 1m page frames. ] these page frames are to be assigned to different processes.

Let's assume that you have 4G memory, the operating system occupies 1 G, and the remaining 3G physical memory is allocated to user space. After each process is started, it will think that it has 3G space available, but in fact it can never use up 3G at all. The process writes to memory is stored discrete. Access the spare memory wherever it is available. Don't ask me about the specific access algorithm, and I haven't studied it.

Process space structure:

1 >. Reserved space

2 >. Stack (where variables are stored)

3 >. Shared library

4 >. Heap (open a file where the data stream is stored)

5 >. Data segment (global static variable store)

6 >. Code segment

The storage relationship between process and memory is as follows:

Every process space has reserved space, and when a process finds that there is not enough data to open, it needs to open a new file (to open a new file, it needs to store data in the process's address space). It is obvious that the process address space shown above is linear and not in the real sense. When a process actually requests to use a memory, it needs to initiate a system call to the kernel, which finds a physical space on the physical memory and tells the process the address of the memory that can be used. For example, if a process wants to open a file on the heap, it needs to apply to the operating system (kernel) for memory space, and within the range allowed by physical memory (that is, the requested memory needs to be less than free physical memory), the kernel is assigned to the process memory address.

Each process has its own desired linear address, which is virtual by the operating system and does not really exist. It needs to make a mapping between the virtual address and the real physical memory, as shown in the figure "the storage relationship between the process and memory". The final storage location of the process data is still mapped to memory. This means that when an execution runs on CPU, it tells CPU that it is its own linear address, and CPU will not go directly to find this linear address (because the linear address is virtual and does not really exist, the real address process is the physical memory address. It will first find the "task struct" of the song process and load the page table (page table) [recording the mapping of linear addresses to physical memory, each of which is called a page table entry.] To read the real physical memory address corresponding to the linear address owned by the process

Expand a little knowledge:

When CPU accesses the address of a process, it first obtains the linear address of the process. It gives the linear address to its own chip MMU for calculation to get the real physical memory address, so as to achieve the purpose of accessing the process memory address. In other words, as long as he wants to access the memory address of a process, he has to go through the MMU operation, which leads to very low efficiency, so they have introduced a cache to store frequently accessed data, so they can improve efficiency. Without MMU calculation, they can directly get the data for processing OK. This buffer we call: TLB: conversion backup buffer (cache page table query results).

Note: the operating system in 32bit is a mapping of wire addresses to physical memory. In the 64bit operating system, it is just the opposite!

6. User mode and kernel state

When the operating system runs, in order to coordinate multitasking, the operating system is divided into two segments, in which a section close to the hardware with privileged permissions is called kernel space, and the process runs in user space. Therefore, the application needs to use privileged instructions or requires system calls to access hardware resources.

As long as it is developed as an application and does not exist as part of the operating system itself, we call it a user-space program. Their running state is called user mode.

Programs that need to run in the kernel (we can think of as the operating system) space, we call them running in the kernel space, they run in the user state, also known as the kernel state of mind. Note: the kernel is not responsible for the specific work. Any privileged operation is available in kernel space.

If every program wants to really run, it is finally completed by initiating system calls to the kernel, or some programs can be completed without the participation of the kernel. For example, if you want to calculate the result of 2 to the power of 32, do you need to run the kernel state? The answer is no, we know that the kernel is not responsible for the specific work, we just want to calculate a result, and we do not need to call any privileged mode, so if you write some code about calculating values, just leave the code to CPU to run.

If an application needs to call the functions of the kernel instead of the functions of the user program, the application will find that it needs to do a privileged operation, and the application itself does not have this ability, and the application will send an application to the kernel to help with the privileged operation. The kernel finds that the application has permission to use privileged instructions, and the kernel runs these privileged instructions and returns the execution result to the application, and then the application gets the execution result of the privileged instruction and continues the subsequent code. This is the pattern shift.

So if a programmer wants to make your program productive, he should try to make your code run in user space. If most of your code runs in kernel space, it is estimated that your application will not give you too much productivity. Because we know that kernel space is not responsible for generating productivity.

Expand a little knowledge:

We know that the operation of the computer is run as specified. Instructions are also divided into privileged instruction level and non-privileged instruction level. Friends who know about computers may know that the CPU architecture of X86 is roughly divided into four levels. There are four rings from the inside out, which are called ring 0, ring 1, ring 2, ring 3. We know that ring 0 is all privileged instructions and ring 3 is user instructions. Generally speaking, the privileged instruction level refers to operating hardware, control bus, and so on.

The execution of a program needs to be coordinated by the kernel, and it is possible to switch between user mode and kernel mode, so the execution of a program must be scheduled by the kernel to CPU for execution. Some applications are run while the operating system is running, in order to complete the basic functions, we let it run automatically in the background, which is called daemon. But some programs are run only when the user needs them, so how do you tell the kernel that the applications we need are running? At this point you need an interpreter that can deal with the operating system and initiate the execution of instructions. To put it bluntly, it is able to submit the running request that the user needs to the kernel, and then the kernel gives it the basic conditions that it needs to run. As a result, the program is executed.

This is the end of the article on "sample Analysis of the principles of the linux operating system". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.