Simple UNIX-Scheduler prequel 07/01 Update SLTechnology News&Howtos

Simple UNIX-Scheduler prequel

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Linux's current process scheduling algorithm is CFS algorithm, which replaces the previous time slice round robin scheduling algorithm. CFS algorithm smoothes the calculation process of dynamic priority, so that the whole system can be preempted by any executive entity at any time. In fact, this is the basic principle of time-sharing system. Imagine how each process / thread can be executed at any time by its own priority like an interrupt. Then the whole system really becomes a "fair" altruistic system. If we want to understand the nature of this altruistic behavior, if we study all kinds of statistics of CFS scheduling algorithm, or study its code, then its effect is certainly not good. If we study the scheduling algorithm of Windows NT kernel, it will undoubtedly waste a lot of time and energy, and finally fall into the details.

In fact, the simplest early version of UNIX is the best embodiment of this idea, its code is very simple, its logic is very simple.

0. Time-sharing system

Feng. Neumann's memory machine has two main points, and more abstractly, this is the Turing machine. CPU and memory, the two major components, if you want to implement a time-sharing system, then look at how early people used computers or machine tools, anyway, any shared machine. People lined up with signs in their hands, and then the administrator called, just like they do business in banks today. But these implementations are much simpler in the computer, and there are only two things that need to be implemented: how to use CPU and how to use memory. With regard to these two points, it seems simple, but in fact there are really some details. How to use CPU, you must execute the task one by one, and then the next one, but if a task takes too long, the queues will inevitably wait for a long time, so the granularity of time-sharing according to the life cycle of the task is too rough, and a finer-grained time-sharing system is a requirement. In order to achieve a more fine-grained time-sharing system, you need a context. In order for a task interrupted by the administrator to continue, the context must have a storage location, that is, memory, which involves the question of how to use memory.

When it comes to the use of memory, simply put, the entire physical memory is shared by all processes, how to share it? Of course, it involves the allocation strategy, that is, to allocate one memory area to one process and another to another process. It is easy to understand segmentation and paging according to this idea. In short, segmentation is generally used to distinguish different memory types within a process, such as code, data, stack, etc., while paging is generally used for memory page division between processes. For example, if a page belongs to a process and another page belongs to another area, we need a table to indicate how to divide it. For modern operating systems, this table is the page table that we are familiar with. For the ancient PDP11, it is the content of the APR register group. The tasks of the time-sharing operating system include managing these tables and managing the entire physical memory. When one process switches to another, the contents of these tables must also be switched. For modern x86 platforms, we are more familiar with the switching of CR3 registers, while the CR3 registers point to the page table that resides in physical memory. There is no doubt that this piece of physical memory can not be used by the process because it stores process metadata. As early as in the PDP11 era, because the physical memory capacity is very small, it is impossible to use this way to store metadata, and it does not achieve a precise paging mechanism, but only a simple virtual address space, and the mapping relationship between virtual address and physical memory page is defined by APR register group. So in the PDP11 era, the whole process is swapped in and out, there is no part of a process that resides in memory, while the other part is in swap space, which is called into memory as needed by a page fault interrupt.

In essence, a simple time-sharing system includes two aspects: CPU time-sharing and physical memory time-sharing. Of course, if the physical memory is large enough, then multiple processes can reside in memory at the same time, which will undoubtedly improve efficiency, but this is not the essence of the time-sharing system, including the later on-demand paging mechanism is not the essence of the time-sharing system, but only a means of MMU management.

Memory management is extremely important in the implementation of time-sharing system, because memory is arranged by space, not driven by clock tick like CPU. How to connect space use with clock tick and establish mapping is the key to the implementation of time-sharing system, so I will spend a lot of space to introduce the details of memory management. The implementation of the time-sharing system is recursive fractal, so the concept of preemption appears. If the fine-grained time-sharing system implementation preempts the unfinished tasks, then the switching caused by the higher priority process caused by the dynamic priority calculation preempts the unfinished time slices pre-allocated to the process.

Process scheduling of 1.UNIX V6

We know that UNIX was born in 1969, but the newborn UNIX is not the real UNIX, because it does not have the famous fork call, it can only say that it can run a fixed two-process time-sharing system, and because these two processes connect two terminals, and the terminal belongs to the user, the UNIX is a multi-user system. Therefore, although UNIX in 1969 is immature, it is indeed a multi-process and multi-user time-sharing system, and the history of modern operating system has begun. The first mature UNIX is for UNIX V6, the version introduced by the famous Leon divine book, which I call plain UNIX.

Before you understand the naive UNIXv6 scheduling mechanism, you must understand the memory management mechanism that ran the UNIX system in the early days. In PDP11, there is no sophisticated paging MMU setting like that of today's x86 platform. PDP11 implements the virtual address space through a set of registers called APR. Note that tables configured in a set of registers are used instead of tables that reside in physical memory. Due to the limit of the number of registers, the capacity of this kind of register table is not destined to be too large, so the conversion from virtual address space to physical address space is bound to be simple. You can't use the concept of multi-level page table to laugh at the management logic of PDP11's virtual address space. Even the multi-level page table management mechanism is developed step by step. With regard to the virtual memory address space, I will open a special article on how the simple UNIX defines the virtual address. In this article, in order not to predominate, I can only evaluate the virtual address space in just a few words: if the C language provides a set of general tools for programmers, then the virtual address space provides programmers with a continuous space of general fixed size. With C language and virtual address space, programmers do not have to pay attention to the details of the machine. C language isolates the underlying processor details for programmers, and virtual address space isolates physical memory size, type and other details for programmers. Programmers think that the data is in memory, but in fact, the memory here only means virtual memory, and the real location of the data may be physical memory or disk. The Internet.

Going back to the above topic, PDP11 uses a simple set of registers to hold the mapping of virtual memory addresses to physical memory addresses, which is the most comprehensive and simple mechanism before implementing a full paging on-demand paging mechanism. Because according to the definition of time-sharing system, the implementation is very simple and efficient. It just needs to be done:

Each process has a set of APR registers that hold the mapping of virtual addresses to physical addresses, and switch APR register groups when the process is switched.

A register maps a page, mapping an 8k virtual page to an 8k physical page, a set of registers with 8 pairs, mapping 8 pages, so the total size of a process's virtual address space is 88k.

The naive MMU mechanism of UNIXv6 described above directly contains the later multi-level page table mechanism. The only difference between them and PDP11's MMU is that the multi-level page table keeps the table item in memory. This is because the process needs more and more space, which makes the multi-level page table become a demand, and this demand is met because of the increasing memory, and the register is more and more unsuitable to do this kind of thing. MMU cache TLB. Also known as fast meters, this facility somewhat replaces registers such as APR. It is worth noting that the multi-level page table is narrow, in fact, in the APR period of PDP11, the MMU table is multi-level, a register maps an entire page instead of an address, and the offset within the page is directly specified by the virtual address.

But in the era when physical memory is very small, it is impossible and unnecessary for a process to have too much virtual address space, and it is impossible to build MMU tables through multi-level page tables, because it will waste more physical memory to store metadata. Some people say that multi-level page tables save memory, which is a big mistake. Multi-level page tables save memory on the premise that many pages do not need the huge 4G address space, so there is no need to create page tables and page table items. If each page in the virtual address space is used, more memory is needed to store multi-level page tables. In addition, the multi-level page table has a greater opportunity to show its ability after realizing the precise virtual memory management of on-demand paging. I will describe it in detail later when it is devoted to plain UNIX memory management.

Well, after saying so much about MMU, we finally start to talk about process scheduling. You know, in the PDP11 era, the virtual address space was only 64k, but the width of the physical memory bus was 18 bits, that is, 256k. The physical memory space was much larger than the virtual memory space, and the physical memory installed was generally more than 64k. Therefore, it is fully capable of allowing a process to completely reside in physical memory at one time, and a process can all reside in memory at run time. However, it can be swapped out to the swap space when it is not running for a long time. For the implementation of UNIXv6 time-sharing system, this hardware architecture promotes the perfect emergence of a great and simple process scheduling system. The scheduling mechanism of UNIXv6 is shown in the following figure:

(I have to ignore the diagram here, because I'm going to use this picture in the next article, because I'm going to compare it with other schedulers, which is more compact, if you can understand the following points. You must imagine what that picture looks like in your head.)

This scheduling mechanism has the following main points:

1) process switching needs the assistance of process 0. Unless a process gives up the CPU itself, it must hand over the execution power to process 0, and then process 0 will decide who will run next.

Why do we have to transfer through the No. 0 process? If you understand the memory management mechanism of PDP11 that I described at great length above, you will immediately understand the reason for the transit through process 0. Because UNIXv6- on plain PDP11 does not have an implementation of the on-demand paging mechanism, it must ensure that all pages of the process to be run are fully memory-resident, so this must be ensured through process 0. Process 0 is also called swap process, also called scheduling process. As long as it is awakened, it will perform the swap in and out of the process, swapping the process to be run into memory, swapping the process that has not been running for a long time out of memory. If this is done, and there is really no process to swap in and out, process 0 will transfer control to the highest priority process, and sleep itself, if there is no such thing to do in the first place. They hand over the power of execution to a process with the highest priority and sleep directly on their own.

2) the priority is recalculated in real time. As long as the higher priority process is ready, the process can be preempted at any time in the user state, and the kernel state of the process must not be forcibly occupied.

Naive's UNIXv6 defines the principle that a process must be preemptive (because it has no concept of a time slice), but there is a limitation that execution flows in kernel state cannot be preempted. Every process priority of UNIXv6 is recalculated in real time in every N clock interrupt processing. the manual says that the value of N is HZ, that is, 1 second, but it can also be modified before compiling. The significance of this recalculation is to find processes that are more worth running and preempt the current process! In fact, this idea is the current CFS scheduling idea of Linux and the scheduling idea of Windows NT, but the implementation strategy is different. However, in order to protect the data structure of the kernel, an exception is defined, that is, the execution flow in kernel state can not be preempted by user processes, which is the strategy of not punishing doctors. Although this strategy is finally broken by kernel preemption, but in fact, in the server field, especially in the environment where high throughput is needed, it is still the best. The system (preemptive) is unchanged, but the policy (not punishing the doctor) keeps pace with the times.

3) the execution interval of the process is smooth, and the chance of the process starving to death is very small.

We can evaluate the O (n) scheduling of Linux 2.4and O (1) scheduling of Linux 2.6. the biggest problem of these two algorithms is process hunger, so there are all kinds of complicated and tedious so-called "tricks" for recalculating priorities, which are ultimately counterproductive. Fortunately, CFS still holds the pipa and half covers his shy face at this time, otherwise. In fact, the problem is not big, because Win7 also faces the same problem, but the system built on Windows NT has a balancer, similar to a kernel thread, which periodically scans hungry threads to raise its priority. In this way, when O (1) of Linux tries to compensate for the interactive process, Windows will devote all its efforts to winning rather than compensating the interactive process. Windows has always had server versions and home versions. In fact, it is similar to whether Linux enables zones such as kernel preemption at compile time, and the difference between the server and home versions of Windows NT is also reflected in the length of time slices. Simple UNIX does not have such a problem, the priority of the process is calculated in real time, as long as there is a higher priority process, grab it at any time, this is only one of them, and the other is that the priority of the kernel execution flow is different from that of the user mode execution flow.

Again, it is worth noting that the simple UNIX process scheduling is not based on time slices, each process does not have a calculated time slice before running, and there are more suitable processes ready to immediately preempt the current process (excluding kernel-state processes). This smooth scheduling mechanism requires that the scheduling strategy must be preemptive and the kernel must be non-preemptive (the doctor is only temporarily in order to protect confidential data before there is no better way, once there is a better protection mechanism, the kernel must also be preemptive!)

4) when the process is in the kernel state, once it is blocked, the priority when it wakes up depends on the cause of the blocking. Because the priority of the kernel state is higher than that of the user state, the kernel mechanism can get in and out quickly.

At this point, I think the Windows NT kernel and the Solaris kernel have done a great job! They all map all execution flows to a certain priority, and even interrupts have their own priority. Even ordinary processes can be on the priority of interrupts when executing something. Solaris goes further and schedules interrupts as threads. But did you know that this idea was already reflected in the system as early as 1975 UNIXv6. UNIXv6 correlates the priority after awakening of the sleep process with the cause of sleep. If you have also read the book "Windows Internals", you will find that the increase of priority after the completion of Windows is also related to the reason of Windows. For example, the increase of thread priority after the completion of disk Icando is not as high as that of the thread after the completion of the sound card. How similar this UNIXv6 is.

two。 What is simplicity?

I say UNIXv6 is simple, but what is plain?

Simplicity means simplification. In the next article, you will find that almost all the ideas of scheduling mechanisms in modern operating systems can be related to the scheduling mechanism of UNIXv6, even worse than UNIXv6. For example, earlier versions of Linux. Similar ones are:

UNIXv6 Scheduler and all versions of Windows NT Scheduler

UNIXv6 scheduler and Linux CFS scheduler

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.