In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about the benchmark system configuration and principle of Linux system performance evaluation, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
Summary
In the process of tuning the performance of high-performance systems, developers often encounter noise from various backgrounds, which makes the collected data inaccurate. This paper mainly analyzes the sources and elimination methods of all kinds of noise from the point of view of CPU and Linux operating system. The ultimate goal is to build a benchmark platform to achieve "0" interference on a specific cpu.
Several sources of background noise interference in Cpu Operation
1. Scheduler:
The impact of the process scheduler on the system is almost ubiquitous, and the Linux kernel generally uses a fair time-sharing scheduling strategy (CFS). Specific parameters are needed to adjust the behavior of the scheduler so as to minimize interference to the measurement process.
two。 Interrupt:
Interrupt is an event that the system must respond to, which has a high priority and can preempt ordinary user processes.
a. Hardware interrupt
Mainly from external events, CPU needs to respond in a very timely manner. For example, the most common IO, clock, and Linux kernels support a large number of hardware interrupts, so attention should be paid to affinity configuration. You can cancel the response to some of the more special interrupts.
b. Soft interrupt Softirq
Soft interrupt is a derivative subsystem of hardware interrupt processing. LInux hardware interrupt response only needs to handle some operations that must be responded immediately, while some operations that can be delayed are handed over to soft interrupts. There are 10 types of soft interrupts in Linux, which we will analyze later.
C. Workqueue
Workqueue is also a common type of delayed operation task in LInux.
3. Power management:
Modern processors usually support some advanced power management functions in order to make more efficient use of energy. These power management functions can also affect performance evaluation if they are not used properly.
4. Time source:
If you want to carry out performance evaluation, you can't do without timestamps. Therefore, the correct collection method of timestamp is also very important.
The above factors are often intertwined, such as the process scheduler needs clock interrupt to drive, and the power management subsystem needs scheduler to drive. The collection of timestamps and micro-architecture are also closely related. Below we will analyze each case one by one.
System configuration case
System configuration Information:
CPU: Intel 9900KF P1 Frequency 3.6Ghz 1-core Turbo 5.0Ghz HT-disabled RAM: 16GB DDR4-3200 Ubuntu 19.04: Kernel 5.0.0-38-generic X86 Boot Parameter 64 Boot Parameter: BOOT_IMAGE=/boot/vmlinuz-5.0.0-38-generic root=UUID=697aea9f-2de2-4b9c-921d-5bd5f963c91f ro ipv6.disable=1 isolcpus=7 nohz_full=7 mce=off tsc=reliable no_watchdog irqaffinity=0 hpet=disable quiet splash vt.handoff=1
Benchmark system configuration objectives:
On the baremetal machine (the configuration in the vt-x environment will be more complex and difficult to accurately control), isolate Core 7 from the scheduler to minimize the interference of various factors to Core 7.
Detailed description of startup parameters:
These startup parameters marked in red are what we are going to decompose in detail below.
Isolcpus = managed_irq cpuslist
Isolcpus mainly isolates the target cpu from the scheduling algorithm of the scheduler. In other words, from the point of view of user processes, the scheduler will not actively schedule any processes to the target cpu. But this parameter alone still doesn't guarantee that all soft / hard interrupts and some other kernel components won't run on the target cpu.
Nohzfull = cpulist
There is also a relatively weakened version of nohz for this parameter. Nohz means that when there is no schedulable entity on the runqueue of the target cpu, the cpu enters the idle state, in which case the cpu stops the clock tick (the default is 10ms once). Then nohzfull goes a step further, stopping the clock tick when there is only one active entity on the runqueue. This greatly reduces interference with the only process that is running (not 100% elimination). It is worth noting that nohzfull is generally not opened in non-server versions of the kernel and the kernel needs to be recompiled. You can check the corresponding kernel compilation option CONFIGNOHZFULL=y. If it is not open, a warning is displayed in the startup log. At the same time, nohzfull implies rcunocbs= cpulist.
The following figure shows the log of the successful open option
The following figure shows the error message with no compilation option turned on.
Modify options in Timer System of the kernel
Nowatchdog
Turn off all software / hardware deadlock monitoring
Hpet=disable, tsc= reliable
This part is mainly for the time subsystem. The main purpose of hpet=disable is to avoid excessive interruptions produced by hpet to interfere with the system. Tsc= reliable marks tsc as reliable, reducing runtime and time source parity. In our verification process, this parameter is of great help to reduce jitter.
Mce=off disable machine check to avoid interrupt
Machine checking is an advanced RAS function, which is very important for the product environment, but for the evaluation part, we first prohibit it.
Isolation of soft and hard interrupts
Disable irqblance service
We do not want any hardware interrupts to be sent to core 7
That's why we need disable irqblance service.
Take care irq affinity
The affinity of hardware interrupts also needs to be noted.
Also to avoid any hardware interrupts being sent to core 7
Modify
/ sys/devices/virtual/workqueue/cpumask to 1
Effect comparison screenshot
The following picture is / proc/interrupts
The following picture is / proc/softirqs
The following figure shows the information displayed by htop, and you can see that the schedulable entities on core 7 have been compressed to a minimum.
MSR
MSR (ModelSpecific Register) is a key interface for configuring processors and obtaining processor status information. MSR is mainly divided into two categories.
Per-Core MSR
Read and write instructions for this type of MSR must be executed locally by core, so try to avoid reading and writing from other core. For example, read and write Core 3 from core 7. In this way, LInux kernel also needs to schedule this read and write operation to the target core3, which will cause unnecessary delay. At the same time, if you try to read and write msr in the user layer (ring3), you also need to switch to kernel to do this (interrupted by IPI,CAL). It will also interfere with the application. The most typical for performance evaluation is APERF/MPERF, and the MSR corresponding to HWP, and the configuration interface MSR of PMU are all Per-Core. The delay in accessing the Per-MSR cannot be completely avoided, so pay attention to the sampling frequency to prevent oversampling.
Un-Core MSR
This type of MSR does not belong to any specific core, but is a public resource. The most typical one is UNCORE_RATIO_LIMIT MSR. Un-Core MSR can initiate reading and writing from any core. As long as you avoid initiating read and write from the core under evaluation
Generally speaking, MSR needs to be operated by loading the kernel module msr (/ dev/msr) and then through the rdmsr/wrmsr tool.
Power management
The power management in Linuxkernel is mainly accomplished by the following two subsystems. After kernel 4.10, the power management system is triggered by a scheduler.
Cpufreq
The Cpuf Freq subsystem mainly manages the adjustment of processor frequency in C0 state, which is mainly composed of two parts.
Cpufreq driver
Mainly for a variety of different hardware adapted to the corresponding FM driver
Cpufreq governor
Mainly a variety of different FM strategies
There are two main choices in X86 environment.
Acpi_cpufreq driver and its corresponding 7 kinds of governor
See the reference link:
Https://www.kernel.org/doc/html/v4.14/admin-guide/pm/cpufreq.html
Intel_pstate driver and its two kinds of governor
(this is the default configuration of the system)
Intel_pstate driver is a special driver compared to other platforms. Intel_pstate driver mainly uses the HWP hardware features of x86 to adjust the frequency. Provides a limited number of customizable strategies. Better automation and less overhead.
Sysfs entries
See the reference link:
Https://www.kernel.org/doc/html/v4.14/admin-guide/pm/cpufreq.html
CpuIdle
The Cpu idle subsystem mainly manages the processor idle state adjustment in the C1-C7 state, which is mainly composed of two parts.
Cpu idle driver
Mainly for a variety of different hardware adapted to the corresponding idle driver
Cpu idle governor
Mainly a variety of idle length strategies
There are two main choices in X86 environment.
Acpi_idle driver
The default is menu governor
Intel_idle driver
The default is menu governor (this is the default configuration of the system, ladder needs to recompile the kernel)
Sysfs entries (see reference link)
Reference link:
Https://www.kernel.org/doc/html/latest/admin-guide/pm/cpuidle.html
Recommended configuration methods:
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
In BIOS, Disable Turbo
How often the power.py [2] script is applied to lock the target core (or disable Pstate in BIOS)
Set the kernel parameter intel_idle.max_cstate=1.
If you need to completely prohibit idle from recommending processor.max_cstate=0, idle=poll
It should be noted here that intel_idle.max_cstate=0 is just disable intel_idle driver switching to acpi_idle driver.
Adjust UNCORE_RATIO_LIMIT min/max ratio according to the characteristics of workload
IPI,TLB Shootdown optimization
Process isolation reduces shootdown, but the kernel part cannot isolate the address space. It still results in a certain amount of tlb shootdown. Disable VT-X to reduce IPI.
MSR, don't oversample! Because per core's MSR read and write operations are called from a non-local core, LInux is scheduled to be executed on the target core through IPI
In addition, the scheduling algorithm / NUMA Aware/L3Cache QoS (RDT) / SMM-BMC/SmartEngine and other modules will have noise interference to the system performance test. Please look forward to the follow-up.
Reference
Intel SDM
Power.py
Https://github.com/intel/CommsPowerManagement
After reading the above, do you have any further understanding of the benchmark system configuration and principle of Linux system performance evaluation? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.