In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains the "Linux high-performance task monopoly CPU example analysis", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Linux high-performance task monopoly CPU example analysis" bar!
Part 1 engineering requirements
In a SMP or NUMA system, the number of CPU is greater than 1. In engineering, we sometimes have a need to let someone who can monopolize the CPU, the CPU does nothing but do the specified tasks, so as to get the benefits of low latency and high real-time.
For example, in DPDK, by setting the
GRUB_CMDLINE_LINUX_DEFAULT= "isolcpus=0-3, 5, 7"
Isolate the CPU0,3,5,7, so that while the DPDK task is running, other tasks will not switch contexts with the DPDK task, thus ensuring the best network performance [1]. In the Realtime application scenario, isolate the CPU2 through isolcpus=2, and then bind the real-time application to the isolated core through taskset:
Taskset-c 2 pn_dev
So as to ensure the low latency requirement [2].
Part 2 user state isolation
In this place, we can see that they all use a startup parameter such as isolcpus.
Practice is the only criterion for testing truth. Let's start an 8-core ARM64 system, run Ubuntu, and specify the startup parameter isolcpus=2:
After the system starts, we run the following simple program (start 8 processes to run the while loop):
We have 8 cores, and now we are running 8 processes, so theoretically speaking, after load balancing, 8 processes should run evenly on 8 cores, but let's take a look at the actual htop results:
We found that the CPU occupancy rate above 3 (that is, CPU2) is 0. 0%. This proves that CPU2 has been isolated and that user space processes cannot run on it.
Of course, at this time, we can forcibly bind one of the a.out to CPU2 through taskset:
From the results of the above command, we can see that the original affinity list of 663 has only 0meme 1 and no 2 for 3-7, but we forcibly set it to 2, and then we can see that the htop,CPU2 occupies 100%:
From the above experiment, we can clearly see that isolcpus=2 makes it impossible for user-space processes to run on CPU2 (unless affinity is set manually).
Nuclear state isolation of Part 3
Interrupt
However, there are not only user-mode tasks that can run on CPU2, but also kernel threads, interrupts, and so on, so can isolcpus= isolate kernel threads and interrupts?
For interrupts, it's easy to see, just to actually verify the smp_affinity of each IRQ:
As can be seen from the figure above, for peripheral interrupts such as 44 and 47, the Linux kernel sets smp_affinity to FB (11111011), which obviously avoids CPU2. Therefore, the actual peripheral interrupt will not occur in CPU2 unless we forcibly bind the interrupt, such as binding interrupt 44 to CPU2:
Echo 2 > / proc/irq/44/smp_affinity_list
After that, we found that interrupt 44 can occur in CPU2:
However, the timer interrupt and IPI of the system, because it is the cornerstone of the Linux system, actually have to run on CPU2. The one that is most likely to cause delay jitter to tasks is, of course, timer tick.
Let's focus on the problem of tick. In general, Linux has been configured with NO_HZ tickless in IDLE state, so when there is nothing running on the CPU2, the actual timer interruption hardly occurs.
Next, we will run the a.out of the previous eight processes in the case of isolcpus=2. By default, no task will occupy the CPU2. By running cat / proc/interrupts | head 2 several times, we can see that timer interruptions in other core occur frequently, while CPU2 is almost unchanged. This is obviously due to the power-saving role of NO_HZ in IDLE:
However, once we put a task on CPU2, even if only one, we will find that the timer interrupts on CPU2 begin to increase:
This shows that even if there is only one thread running on the isolated CPU, the timer tick will start to run, and of course, the timer tick will frequently interrupt this thread, resulting in a lot of context switching. You must think that Linux is so stupid. Since there is only one person, there is no need for time slicing and scheduling in 2 or more tasks, so why run tick? The reason is that our kernel only enables IDLE's NO_HZ by default:
Let's recompile a kernel to enable NO_HZ_FULL:
When we enable NO_HZ_FULL, Linux supports NO_HZ when there is only one task on CPU. But two of them are dumbfounded, so this "FULL" is not really FULL [3]. This is understandable, of course, because there are two issues related to time slice scheduling. When to enable NO_HZ_FULL, the kernel documentation Documentation/timers/no_hz.rst has clear "instructions", only in real-time and HPC scenarios, etc., otherwise the default NO_HZ_IDLE is your best choice:
We recompiled the kernel, selected NO_HZ_FULL, and then started Linux. Note that the parameter nohz_full=2 is added during startup, so that CPU2 supports NO_HZ_FULL:
Rerun CPU2 with only one task, and see what happens when its timer outage occurs:
Find that the tick on CPU2 is stable at 188, so I believe you will be happier, because your exclusive occupation is more thorough!
Next, let's put another task into the CPU2. When there are two tasks, the timer tick on the CPU2 begins to increase:
However, this may not be a problem, because we agreed on "monopoly", when 1 task is exclusive, timer tick should not disturb it, which should be a very ideal situation!
Kernel mode thread
Kernel-mode threads are actually similar to user-mode threads, and when they are not bound to the isolated CPU, they will not run to the isolated CPU. Let's do the experiment with the dma_map_benchmark added by the author in the kernel [4], and start 16 kernel threads for DMA map and unmap (note that we only have 8 cores):
. / dma_map_benchmark-s 120-t 16
We see that the CPU occupancy on CPU2 is also 0:
The dma_map_ benchmark threads in the kernel are occupying CPU0-1, 3-7, but not CPU2:
However, if a kernel thread binds a thread to an isolated CPU with kthread_bind_mask () similar to API, it is similar to using taskset to bind a user-mode task to CPU.
Part 4 Best practices Guide
For scenarios with high real-time requirements and high-performance computing, if you want a task to monopolize CPU, the ideal choice is:
1. Using isolcpus to isolate CPU
two。 Binds the specified task to the isolated CPU
3. Carefully and accidentally bind interrupts and kernel threads to the isolated CPU and troubleshoot these "unexpected" molecules
4. Enable NO_HZ_FULL, the effect is even better, because even the timer tick interrupt does not bother you.
Thank you for your reading, the above is the content of "Linux high-performance task exclusive CPU example analysis". After the study of this article, I believe you have a deeper understanding of the problem of Linux high-performance task exclusive CPU example analysis, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.