In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to understand the CPU context switching of Linux. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
How to understand the context switching of Linux
Linux is a multitasking operating system that supports much more tasks running at the same time than CPU. In fact, these tasks do not really run at the same time, because the system takes turns assigning CPU to them in a very short time, creating the illusion that multiple tasks are running at the same time.
Before each task runs, CPU needs to know where to load the task and where to start to run, and the system needs to set CPU registers and program counters in advance. CPU registers are small and extremely fast memory built into CPU. The program counter is used to store the location of the instruction being executed by CPU, or the location of the next instruction to be executed. They are all environments that CPU must rely on before running tasks, also known as CPU contexts.
Context switching is to first save the CPU context of the previous task, then load the context of the new task into these registers and program counters, and finally jump to the new location referred to by the program counter to run the new task. These saved contexts are stored in the system kernel and loaded again when the task is rescheduled for execution. This ensures that the original state of the task is not affected and that the task appears to be running continuously.
According to the different tasks, the context switching of CPU can be divided into several different scenarios, namely: process context switching, thread context switching, interrupt context switching.
Process context switching
1. User space and kernel space
According to the privilege level, Linux divides the running space of the process into kernel space and user space, corresponding to Ring 0 and Ring 3 of CPU privilege level, respectively.
Kernel space (Ring 0) has the highest permissions and can access all resources directly.
User space (Ring 3) can only access restricted resources, can not directly access hardware devices such as memory, and must be trapped in the kernel through system calls to access these privileged resources.
Processes can run in both user space and kernel space. When the user space is running, it is called the user state of the process, and when it is trapped in the kernel space, it is called the kernel state of the process.
two。 System call
The transition from user mode to kernel state needs to be accomplished by system call. For example, when viewing files, you need to perform multiple system calls: open, read, write, close, and so on. The procedure of the system call is as follows:
First, save the instruction location of the original user mode in the CPU register
In order to execute the kernel code, the CPU register needs to be updated to the new location of the kernel state instruction, and finally jump to the kernel state to run the kernel task.
After the system call, the CPU register needs to restore the original saved user state, and then switch to user space to continue running the process.
So, in the process of one system call, there are actually two CPU context switches.
However, the process of system call does not involve process user resources such as virtual memory, nor will it switch processes, which is different from the usual process context switching:
Process context switching, which refers to switching from one process to another to run
The same process is always running during the system call
Therefore, the process of system calls is often referred to as privileged mode switching rather than context switching.
3. Process context switching
The process is managed and scheduled by the kernel, and the switching of the process can only occur in the kernel state, so the context of the process includes not only the user space resources such as virtual memory, stack and global variables, but also the state of kernel space such as kernel stack and register.
Therefore, the context switching of the process is one more step than that of the system call: the virtual memory and stack of the current process need to be saved before saving the kernel state and CPU register of the current process, and the virtual memory and user stack of the process need to be refreshed after loading the kernel state of the next process.
The process of saving and restoring the context is not free and requires the kernel to run on CPU. According to tests, each context switch takes dozens of nanoseconds to a few subtle CPU times. Especially in the case of a large number of process context switching, it is easy to cause CPU to spend a lot of time on the preservation and recovery of registers, kernel stacks, virtual memory and other resources, thus greatly shortening the time to run the process.
Linux manages the mapping of virtual memory to physical memory through TLB. When virtual memory is updated, TLB also needs to be refreshed, and memory access slows down. Especially on multiprocessor systems, the cache is shared by multiple processors. Refreshing the cache will affect not only the processes of the current processors, but also the processes of other processors that share the cache.
4. When to switch the process context
Linux maintains a ready queue for each CPU, sorts active processes by priority and the time it takes to wait for CPU, and then selects the process that needs CPU most, that is, the process with the highest priority and the longest waiting time for CPU. So, when will the process be scheduled to run on CPU?
When the process is terminated, the CPU it used before will be released, and a new process will be run from the ready queue.
In order to ensure that all processes can be fairly scheduled, CPU time is divided into time slices, which are allocated to each process in turn. When a process runs out of time slices, it will be suspended by the system and switch to another process waiting for CPU to run.
When the system resources are insufficient, the process can not run until the resources are satisfied. At this time, the process will be suspended and other processes will be scheduled to run by the system.
When the process is actively suspended through the sleep function sleep, it is also rescheduled.
When there is a higher priority process running, in order to ensure the running of the high priority process, the current process will be suspended and run by the high priority process.
When a hardware interrupt occurs, the process on the CPU is suspended and the interrupt service program in the kernel is executed instead.
Thread context switch
The biggest difference between a thread and a process is that a thread is the smallest unit scheduled by the operating system, while a process is the smallest unit of resources allocated by the operating system. The so-called kernel scheduling, in fact, the scheduling object is the thread, and the process only provides the thread with virtual memory, global variables and other resources. We can understand threads and processes as follows:
When a process has only one thread, it can be considered that the process is equal to the thread
When a process has multiple threads, these threads share resources such as virtual memory and global variables. These resources do not need to be modified during context switching.
In addition, threads also have their own private data, such as stacks and registers, which also need to be saved during context switching.
In fact, the context switching of threads can be divided into two situations:
The front and back threads belong to different processes. At this point, because resources are not shared, the switching process is the same as process context switching.
The front and back threads belong to the same process. At this time, virtual memory is shared, when context switching, virtual memory resources remain unchanged, only need to switch the private number of threads, registers and other unshared data.
It can be found that thread switching within the same process consumes less resources than switching between multiple processes, which is one of the advantages of multithreading instead of multi-processes.
Interrupt context switch
In order to respond quickly to hardware events, interrupt handling interrupts the normal scheduling and execution of the process, and instead calls the interrupt handler to respond to device events. When interrupting other processes, the current state of the process needs to be saved, so that the process can still resume running from its original state after the interruption is over.
Unlike the process context, interrupt context switching does not involve the user state of the process. So even if the interrupt process interrupts a process in user mode, there is no need to save and restore the process's virtual memory, global variables and other user-state resources. In fact, the interrupt context only includes the states necessary for kernel interrupt service program execution, including CPU registers, kernel stacks, hardware interrupt parameters, and so on.
For the same CPU, interrupt processing has a higher priority than processes, and because interrupts interrupt the scheduling and execution of normal processes, most interrupt handlers are short in order to finish execution as soon as possible.
Like process context switching, interrupt context switching also consumes CPU, and when you find that there are too many interrupts, you need to be careful to check whether it will cause serious performance problems to your system.
Concept summary
To sum up, no matter which scenario causes the context switch, you should know:
CPU context switching is one of the core functions to ensure the normal operation of Linux system. In general, we do not need to pay special attention to it.
Too much context switching will waste CPU time on the preservation and recovery of data such as registers, kernel stacks, virtual memory, etc., thus shortening the real running time of the process, resulting in a significant decline in the overall performance of the system.
How to view the context switch of the system
We can use the vmstat tool to view the context switch of the system. Vmstat is mainly used to analyze the memory usage of the system, and is also often used to analyze the number of CPU context switches and interrupts.
# output a set of data every 5 seconds $vmstat 5 procs-memory- swap---io-----system---cpu- r b swpd free buff cache si so bi bo in cs us sy id wa st 00 7005360 91564 818900 00 25 33 00 00
We need to focus on the following four areas:
Cs (context switch) is the number of context switches per second.
In (interrupt) is the number of interrupts per second.
R (Running or Runnable) is the length of the ready queue, that is, the number of processes running and waiting for CPU.
B (Blocked) is the number of processes in an uninterruptible sleep state.
To see the details of each process, you need to use pidstat, add the-w option to it, and you can see how each process's context is switched.
# output a set of data every 5 seconds $pidstat-w 5 Linux 4.15.0 (ubuntu) 09Thirteen CPU) 08:18:26 UID PID cswch/s nvcswch/s Command 08:18:31 0 1 0.20 0.00 systemd 08:18:31 08 5.40 0.00 rcu_sched
There are two columns of the above results that we focus on, one is cswch, which represents the number of voluntary context switches per second, and the other is nvcswch, which represents the number of involuntary context switches per second.
Voluntary context switching refers to the context switch caused by the process's inability to obtain the required resources. For example, when IO, memory and other system resources are insufficient, voluntary context switching will occur.
Involuntary context switching refers to the context switching in which the process is forced to be scheduled by the system because the time slice has arrived and other reasons. For example, involuntary context switching can easily occur when a large number of processes are preempting CPU.
Case analysis
1. Prepare the environment
Sysbench is a multi-threaded benchmark tool, which is generally used to evaluate the database load under different system parameters. In this case, it is regarded as an abnormal process to simulate the problem of excessive context switching.
# pre-install sysbench $yum install sysbench-y
two。 Operation and analysis
First, run sysbench in the first terminal to simulate the bottleneck of multithread scheduling in the system:
# run a 5-minute benchmark test with 10 threads to simulate the problem of multithread switching $sysbench-threads=10-max-time=300 threads run
Then run vmstat on the second terminal to observe the context switch:
# output a set of data every 1 second (Ctrl+C is required) $vmstat 1 procs-memory- swap---io-----system---cpu- r b swpd free buff cache si so bi bo in cs us sy id wa st 600 6487428 118240 1292772 1292772 9019 1398830 16 84 000 8 0 0 6487428 118240 1292772 0 0 0 10191 1392312 16 84 0 0 0
You can see that the number of context switches for the cs column has increased from 35 to 1.39 million. Observe several other metrics:
R column: the length of the ready queue is 8, which is much larger than the number of CPU, so there will be a lot of CPU competition
Us and sys columns: together, these two columns rise to 100%. The Personality sys column is as high as 84%, indicating that CPU is mainly occupied by the kernel.
In column: the number of interrupts is about 10, 000, indicating that interrupts are also a potential problem.
Comprehensive analysis, due to the system ready queue is too long, that is, the number of processes running and waiting for CPU is too large, resulting in a large number of context switching, which in turn leads to the increase of the occupancy rate of system CPU.
We can use pidstat to continue to analyze which process is causing these problems?
# output a set of data every 1 second (Ctrl+C is required) #-w parameter indicates the output process switching metric The-u parameter indicates that the output CPU usage indicator $pidstat-w-u 1 08:06:33 UID PID% usr% system% guest% wait% CPU CPU Command 08:06:34 0 10488 30.00 100.00 0.00 100.00 0 sysbench 08:06:34 0 26326 0.00 0.00 0.00 1.00 0 kworker/u4:2 08:06:33 UID PID cswch/s nvcswch/s Command 08:06:34 08 11.00 0.00 rcu_sched 08:06:34 0 16 1.00 0.00 ksoftirqd/1 08:06:34 0 471 1.00 0.00 hv_balloon 08:06:34 0 1230 1.00 0.00 iscsid 08:06:34 0 4089 1.00 0.00 kworker/1:5 08:06:34 0 4333 1.00 0.00 kworker/0:3 08:06:34 0 10499 1.00 224.00 pidstat 08:06:34 0 26326 236.00 0.00 kworker/u4:2 08:06:34 1000 26784 223.00 0.00 sshd
It can be found that the increase in CPU usage is caused by sysbench, but context switching comes from other processes, including pidstat, which has the highest frequency of involuntary context switching, and kernel threads kworker and sshd, which have the highest frequency of voluntary context switching.
The default pidstat displays the metric data of the process, and the metric of the thread will not be output until the-t parameter is added.
# output a set of data every 1 second (Ctrl+C is required) #-wt parameter indicates the context switching metric of the output thread $pidstat-wt 1 08:14:05 UID TGID TID cswch/s nvcswch/s Command... 08:14:05 0 10551-6.00 0.00 sysbench 08:14:05 0-10551 6.00 0.00 | _ _ sysbench 08:14:05 0-10552 18911.00 103740.00 | _ _ sysbench 08:14:05 0-10553 18915.00 100955.00 | _ _ sysbench 08:14:05 0-10554 18827.00 103954.00 | _ _ sysbench.
Although the number of context switches of the sysbench process is small, the number of context switches of its child threads is very large, so it can be determined that the culprit of context switching is the sysbench process. It's not over yet. I remember that the number of interrupts we saw through vmstat reached 10,000. What type of interrupts has risen?
We can read the interrupt usage through / proc/interrupts by running the following command:
The #-d parameter indicates the highlighted changed area $watch-d cat / proc/interrupts CPU0 CPU1... RES: 2450431 5279697 Rescheduling interrupts...
It can be found that the fastest change is the rescheduling interrupt (RES), which means to wake up the idle CPU to schedule new tasks to run. This is the mechanism that schedulers use to distribute task queues to different CPU in multiprocessor systems (SMP), often referred to as interprocessor interrupts. The root cause is the scheduling problem of too many tasks, which is consistent with the results of the previous analysis.
The number of context switches per second is normal.
This number actually depends on the CPU performance of the system itself. If the number of context switching of the system is relatively stable, from hundreds to less than 10,000, it should be considered normal. If the number of context switches exceeds 10,000, or the number of switches increases by an order of magnitude, there is likely to be a performance problem.
At this point, you also need to do a specific analysis according to the type of context switch, such as:
Voluntary context switching has increased, indicating that processes are waiting for resources, and other problems such as IO may occur.
The number of involuntary context switching has increased, indicating that processes are being forced to schedule, that is, they are all competing for CPU, indicating that CPU has indeed become a bottleneck.
The number of interrupts has increased, indicating that CPU is occupied by interrupt handlers, and you need to analyze the specific types of interrupts by looking at the / proc/interrupts file.
After reading the above, do you have any further understanding of how to understand Linux's CPU context switching? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 265
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.