Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the benchmark system configuration and principle of Linux system performance evaluation

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the benchmark system configuration and principle of Linux system performance evaluation, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

Summary

In the process of tuning the performance of high-performance systems, developers often encounter noise from various backgrounds, which makes the collected data inaccurate. This paper mainly analyzes the sources and elimination methods of all kinds of noise from the point of view of CPU and Linux operating system. The ultimate goal is to build a benchmark platform to achieve "0" interference on a specific cpu.

Several sources of background noise interference in Cpu Operation

1. Scheduler:

The impact of the process scheduler on the system is almost ubiquitous, and the Linux kernel generally uses a fair time-sharing scheduling strategy (CFS). Specific parameters are needed to adjust the behavior of the scheduler so as to minimize interference to the measurement process.

two。 Interrupt:

Interrupt is an event that the system must respond to, which has a high priority and can preempt ordinary user processes.

a. Hardware interrupt

Mainly from external events, CPU needs to respond in a very timely manner. For example, the most common IO, clock, and Linux kernels support a large number of hardware interrupts, so attention should be paid to affinity configuration. You can cancel the response to some of the more special interrupts.

b. Soft interrupt Softirq

Soft interrupt is a derivative subsystem of hardware interrupt processing. LInux hardware interrupt response only needs to handle some operations that must be responded immediately, while some operations that can be delayed are handed over to soft interrupts. There are 10 types of soft interrupts in Linux, which we will analyze later.

C. Workqueue

Workqueue is also a common type of delayed operation task in LInux.

3. Power management:

Modern processors usually support some advanced power management functions in order to make more efficient use of energy. These power management functions can also affect performance evaluation if they are not used properly.

4. Time source:

If you want to carry out performance evaluation, you can't do without timestamps. Therefore, the correct collection method of timestamp is also very important.

The above factors are often intertwined, such as the process scheduler needs clock interrupt to drive, and the power management subsystem needs scheduler to drive. The collection of timestamps and micro-architecture are also closely related. Below we will analyze each case one by one.

System configuration case

System configuration Information:

CPU: Intel 9900KF P1 Frequency 3.6Ghz 1-core Turbo 5.0Ghz HT-disabled RAM: 16GB DDR4-3200 Ubuntu 19.04: Kernel 5.0.0-38-generic X86 Boot Parameter 64 Boot Parameter: BOOT_IMAGE=/boot/vmlinuz-5.0.0-38-generic root=UUID=697aea9f-2de2-4b9c-921d-5bd5f963c91f ro ipv6.disable=1 isolcpus=7 nohz_full=7 mce=off tsc=reliable no_watchdog irqaffinity=0 hpet=disable quiet splash vt.handoff=1

Benchmark system configuration objectives:

On the baremetal machine (the configuration in the vt-x environment will be more complex and difficult to accurately control), isolate Core 7 from the scheduler to minimize the interference of various factors to Core 7.

Detailed description of startup parameters:

These startup parameters marked in red are what we are going to decompose in detail below.

Isolcpus = managed_irq cpuslist

Isolcpus mainly isolates the target cpu from the scheduling algorithm of the scheduler. In other words, from the point of view of user processes, the scheduler will not actively schedule any processes to the target cpu. But this parameter alone still doesn't guarantee that all soft / hard interrupts and some other kernel components won't run on the target cpu.

Nohzfull = cpulist

There is also a relatively weakened version of nohz for this parameter. Nohz means that when there is no schedulable entity on the runqueue of the target cpu, the cpu enters the idle state, in which case the cpu stops the clock tick (the default is 10ms once). Then nohzfull goes a step further, stopping the clock tick when there is only one active entity on the runqueue. This greatly reduces interference with the only process that is running (not 100% elimination). It is worth noting that nohzfull is generally not opened in non-server versions of the kernel and the kernel needs to be recompiled. You can check the corresponding kernel compilation option CONFIGNOHZFULL=y. If it is not open, a warning is displayed in the startup log. At the same time, nohzfull implies rcunocbs= cpulist.

The following figure shows the log of the successful open option

The following figure shows the error message with no compilation option turned on.

Modify options in Timer System of the kernel

Nowatchdog

Turn off all software / hardware deadlock monitoring

Hpet=disable, tsc= reliable

This part is mainly for the time subsystem. The main purpose of hpet=disable is to avoid excessive interruptions produced by hpet to interfere with the system. Tsc= reliable marks tsc as reliable, reducing runtime and time source parity. In our verification process, this parameter is of great help to reduce jitter.

Mce=off disable machine check to avoid interrupt

Machine checking is an advanced RAS function, which is very important for the product environment, but for the evaluation part, we first prohibit it.

Isolation of soft and hard interrupts

Disable irqblance service

We do not want any hardware interrupts to be sent to core 7

That's why we need disable irqblance service.

Take care irq affinity

The affinity of hardware interrupts also needs to be noted.

Also to avoid any hardware interrupts being sent to core 7

Modify

/ sys/devices/virtual/workqueue/cpumask to 1

Effect comparison screenshot

The following picture is / proc/interrupts

The following picture is / proc/softirqs

The following figure shows the information displayed by htop, and you can see that the schedulable entities on core 7 have been compressed to a minimum.

MSR

MSR (ModelSpecific Register) is a key interface for configuring processors and obtaining processor status information. MSR is mainly divided into two categories.

Per-Core MSR

Read and write instructions for this type of MSR must be executed locally by core, so try to avoid reading and writing from other core. For example, read and write Core 3 from core 7. In this way, LInux kernel also needs to schedule this read and write operation to the target core3, which will cause unnecessary delay. At the same time, if you try to read and write msr in the user layer (ring3), you also need to switch to kernel to do this (interrupted by IPI,CAL). It will also interfere with the application. The most typical for performance evaluation is APERF/MPERF, and the MSR corresponding to HWP, and the configuration interface MSR of PMU are all Per-Core. The delay in accessing the Per-MSR cannot be completely avoided, so pay attention to the sampling frequency to prevent oversampling.

Un-Core MSR

This type of MSR does not belong to any specific core, but is a public resource. The most typical one is UNCORE_RATIO_LIMIT MSR. Un-Core MSR can initiate reading and writing from any core. As long as you avoid initiating read and write from the core under evaluation

Generally speaking, MSR needs to be operated by loading the kernel module msr (/ dev/msr) and then through the rdmsr/wrmsr tool.

Power management

The power management in Linuxkernel is mainly accomplished by the following two subsystems. After kernel 4.10, the power management system is triggered by a scheduler.

Cpufreq

The Cpuf Freq subsystem mainly manages the adjustment of processor frequency in C0 state, which is mainly composed of two parts.

Cpufreq driver

Mainly for a variety of different hardware adapted to the corresponding FM driver

Cpufreq governor

Mainly a variety of different FM strategies

There are two main choices in X86 environment.

Acpi_cpufreq driver and its corresponding 7 kinds of governor

See the reference link:

Https://www.kernel.org/doc/html/v4.14/admin-guide/pm/cpufreq.html

Intel_pstate driver and its two kinds of governor

(this is the default configuration of the system)

Intel_pstate driver is a special driver compared to other platforms. Intel_pstate driver mainly uses the HWP hardware features of x86 to adjust the frequency. Provides a limited number of customizable strategies. Better automation and less overhead.

Sysfs entries

See the reference link:

Https://www.kernel.org/doc/html/v4.14/admin-guide/pm/cpufreq.html

CpuIdle

The Cpu idle subsystem mainly manages the processor idle state adjustment in the C1-C7 state, which is mainly composed of two parts.

Cpu idle driver

Mainly for a variety of different hardware adapted to the corresponding idle driver

Cpu idle governor

Mainly a variety of idle length strategies

There are two main choices in X86 environment.

Acpi_idle driver

The default is menu governor

Intel_idle driver

The default is menu governor (this is the default configuration of the system, ladder needs to recompile the kernel)

Sysfs entries (see reference link)

Reference link:

Https://www.kernel.org/doc/html/latest/admin-guide/pm/cpuidle.html

Recommended configuration methods:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

In BIOS, Disable Turbo

How often the power.py [2] script is applied to lock the target core (or disable Pstate in BIOS)

Set the kernel parameter intel_idle.max_cstate=1.

If you need to completely prohibit idle from recommending processor.max_cstate=0, idle=poll

It should be noted here that intel_idle.max_cstate=0 is just disable intel_idle driver switching to acpi_idle driver.

Adjust UNCORE_RATIO_LIMIT min/max ratio according to the characteristics of workload

IPI,TLB Shootdown optimization

Process isolation reduces shootdown, but the kernel part cannot isolate the address space. It still results in a certain amount of tlb shootdown. Disable VT-X to reduce IPI.

MSR, don't oversample! Because per core's MSR read and write operations are called from a non-local core, LInux is scheduled to be executed on the target core through IPI

In addition, the scheduling algorithm / NUMA Aware/L3Cache QoS (RDT) / SMM-BMC/SmartEngine and other modules will have noise interference to the system performance test. Please look forward to the follow-up.

Reference

Intel SDM

Power.py

Https://github.com/intel/CommsPowerManagement

After reading the above, do you have any further understanding of the benchmark system configuration and principle of Linux system performance evaluation? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report