How to use NUMA Technology 07/09 Update SLTechnology News&Howtos

How to use NUMA Technology

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to use NUMA technology, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

When the os layer numa is turned off, opening the numa of the bios layer will affect the performance, and the QPS will be reduced by 15-30%.

When numa is turned off at the bios plane, the performance will not be affected regardless of whether the numa at the os plane is turned on or not.

Install numactl:

# yum install numactl-y

# numastat is equivalent to cat / sys/devices/system/node/node0/numastat, recording the details of all memory nodes in the system in the / sys/devices/system/node/ folder. # numactl-- hardware enumerates NUMA nodes on the system

# numactl-- show to view binding information

In Redhat or Centos systems, you can use commands to determine whether numa is enabled in the bios layer.

# grep-I numa / var/log/dmesg

If the output is: No NUMA configuration found

Numa is disable. If not, numa is enable. For example, NUMA: Using 30 for the hash shift.

You can view the NUMA topology of the machine through the lscpu command.

When the numa_miss value is found to be high, it indicates that the allocation policy needs to be adjusted. For example, bind the specified process association to the specified CPU to improve the memory hit ratio.

Today's machines have multiple CPU and multiple memory blocks. In the past, we all thought of a memory block as a large block of memory, and all CPU access messages to this shared memory were the same. This is the SMP model that was commonly used before. However, with the increase of processors, shared memory may cause memory access conflicts to become more and more serious, and performance cannot increase if memory access reaches a bottleneck. NUMA (Non-Uniform Memory Access) is a model introduced in such an environment. For example, a machine has 2 processors and 4 memory blocks. We combine one processor and two blocks of memory, which is called a NUMA node, so that the machine has two NUMA node. In terms of physical distribution, the physical distance between the NUMA node processor and the memory block is smaller, so the access is faster. For example, the machine is divided into left and right processors (cpu1, cpu2), with two memory blocks (memory1.1, memory1.2, memory2.1,memory2.2) on both sides of each processor, so that NUMA node1's cpu1 accesses memory1.1 and memory1.2 faster than memory2.1 and memory2.2. Therefore, if the mode of using NUMA can ensure that the CPU in this node can only access the memory blocks in this node, then this kind of efficiency is the highest.

You can determine which cpu and which memory to run the program in by using numactl-m and-physcpubind when running the program. Playing cpu-topology gives you a table when the program uses only one node resource and a comparison table that uses multiple node resources (about the difference between 38s and 28s). Therefore, it makes practical sense to limit the program to run in numa node.

But then again, will it be good to formulate a numa? -- numa's trap. SWAP's crime and punishment article talks about a trap of numa. The phenomenon is that when your server still has memory, you find that it is already using swap, and it has even caused the machine to stagnate. This may be due to the limitation of numa. If a process restricts it to use only its own numa node's memory, then when its own numa node memory runs out, it will not use other numa node's memory. It will start to use swap, or even worse, when the machine is not set up with swap, it may directly crash! So you can use numactl-interleave=all to remove the restrictions on numa node.

To sum up, the conclusion is that the use of NUMA is determined according to the specific business.

If your program takes up a lot of memory, you should mostly choose to turn off the numa node limit (or turn off numa from the hardware). Because at this time your program has a good chance of running into a numa trap.

In addition, if your program does not take up a lot of memory, but requires faster running time. You should mostly choose to restrict access to this numa node to deal with it.

Kernel parameter overcommit_memory:

It is a memory allocation strategy

Optional values: 0, 1, 2.

0: indicates that the kernel will check whether there is enough memory available for the application process; if there is enough memory available, the memory request is allowed; otherwise, the memory request fails and the error is returned to the application process.

1: indicates that the kernel allows all physical memory to be allocated, regardless of the current memory state.

2: indicates that the kernel allows more than the sum of all physical memory and swap space to be allocated

Kernel parameter zone_reclaim_mode:

Optional values 0,1

A. When a node runs out of available memory:

1. If 0, then the system tends to allocate memory from other nodes

2. If it is 1, then the system tends to reclaim Cache memory from the local node most of the time

B, Cache are important for performance, so 0 is a better choice

NUMA problem of mongodb

The mongodb log is shown as follows:

WARNING: You are running on a NUMA machine.

We suggest launching mongod like this to avoid performance problems:

Numactl-interleave=all mongod [other options]

Solution, temporarily modify the numa memory allocation policy to interleave=all (the policy for interleaving allocation on all node nodes):

1. Add numactl-interleave=all before the original startup command

Such as # numactl-- interleave=all ${MONGODB_HOME} / bin/mongod-- config conf/mongodb.conf

two。 Modify kernel parameters

Echo 0 > / proc/sys/vm/zone_reclaim_mode; echo "vm.zone_reclaim_mode = 0" > > / etc/sysctl.conf

1. NUMA and SMP

NUMA and SMP are two CPU-related hardware architectures. In SMP architecture, all CPU compete for one bus to access all memory. The advantage is resource sharing, while the disadvantage is that bus competition is fierce. As the number of CPU on PC servers becomes more and more (not just the number of CPU cores), the disadvantages of bus contention become more and more obvious, so Intel launched the NUMA architecture on Nehalem CPU, while AMD also introduced Opteron CPU based on the same architecture.

The most important feature of NUMA is the introduction of the concepts of node and distance. For the two most valuable hardware resources, CPU and memory, NUMA divides the resource group (node) in an almost strict way, and the CPU and memory in each resource group are almost equal. The number of resource groups depends on the number of physical CPU (most existing PC server have two physical CPU, each CPU has 4 cores); the concept of distance is used to define the cost of invoking resources between each node and to provide data support for resource scheduling optimization algorithms.

II. Strategies related to NUMA

1. Each process (or thread) inherits the NUMA policy from the parent process and assigns a priority node. If the NUMA policy allows, the process can call resources on other node.

2. The CPU allocation strategies of NUMA include cpunodebind and physcpubind. Cpunodebind specifies that the process runs on several node, while physcpubind can specify more precisely which cores to run on.

3. The memory allocation strategies of NUMA include localalloc, preferred, membind and interleave.

Localalloc specifies that the process requests memory allocation from the current node

On the other hand, preferred loosely specifies a recommended node to obtain memory, and if there is not enough memory on the recommended node, the process can try another node.

Membind can specify several node, and the process can only request memory allocation from these specified node.

Interleave specifies that the process interweaves the allocation of memory with the RR (Round Robin polling scheduling) algorithm from a specified number of node.

Because the default memory allocation policy of NUMA is to give priority to the allocation of local memory in the CPU where the process is located, it will lead to uneven memory allocation among CPU nodes. When a CPU node runs out of memory, it will result in swap instead of allocating memory from remote nodes. This is called the swap insanity phenomenon.

MySQL adopts threading mode, and the support for NUMA feature is not good. If only one MySQL instance is running on a stand-alone machine, we can choose to close NUMA. There are three ways to close it:

1. Hardware layer, setting shutdown in BIOS

2.OS kernel, setting numa=off at startup

3. You can use the numactl command to change the memory allocation policy to interleave.

If we run multiple MySQL instances on a single machine, we can bind MySQL to different CPU nodes and use the bound memory allocation strategy to force memory allocation within this node, which can not only make full use of the NUMA characteristics of hardware, but also avoid the problem of low utilization of multi-core CPU by single instance MySQL.

III. The relationship between NUMA and swap

As you may have found, NUMA's memory allocation strategy is not fair between processes (or threads). In existing Redhat Linux, localalloc is the default NUMA memory allocation policy, and this configuration option makes it easy for a resource monopolizer to run out of memory for a node. When a node runs out of memory, Linux just allocates the node to a process (or thread) that consumes a lot of memory, and the swap is properly generated. Although there is still a lot of page cache to release at this time, there is even a lot of free memory.

Fourth, solve the swap problem

Although the principle of NUMA is relatively complex, solving swap is actually simple: just use numactl-interleave to modify the NUMA policy before starting MySQL.

It is worth noting that the command numactl can not only adjust the NUMA policy, but also be used to view the current resource usage of each node. It is a command worth studying.

1. CPU

Let's start with CPU.

If you check carefully, there is an interesting phenomenon on some servers: when you cat / proc/cpuinfo, you will find that the frequency of CPU is not the same as its nominal frequency:

# cat / proc/cpuinfo

Processor: 5

Model name: Intel (R) Xeon (R) CPU E5-2620 0 @ 2.00GHz

Cpu MHz: 1200.000

This is the CPU of Intel E5-2620, and he is the CPU of 2.00G * 24, but we found that the fifth CPU has a frequency of 1.2g.

What is the reason for this?

All of this actually comes from CPU's latest technology: energy-saving mode. Operating system and CPU hardware cooperate, when the system is not busy, in order to save electricity and reduce the temperature, it will reduce the frequency of CPU. This is a boon for environmentalists and the fight against global warming, but it could be a disaster for MySQL.

In order to ensure that MySQL can make full use of the resources of CPU, it is recommended to set CPU to the maximum performance mode. This setting can be set in BIOS and the operating system, but of course it is better and more thorough to set this option in BIOS. Due to the differences between different BIOS types, setting CPU to maximum performance mode varies greatly, so we won't show you how to set it here.

Then let's look at what we can optimize in terms of memory.

I) Let's take a look at numa first

Non-uniform storage access structure (NUMA: Non-Uniform Memory Access) is also the latest memory management technology. It corresponds to the symmetric multiprocessor architecture (SMP: Symmetric Multi-Processor). The simple teams are as follows:

As shown in the figure, we will not cover the detailed NUMA information here. But we can see intuitively: the cost of SMP access to memory is the same; but under the NUMA architecture, the cost of accessing local memory is not the same as that of non-local memory. Correspondingly, according to this feature, we can set the memory allocation mode of the process on the operating system. Currently supported methods include:

-- interleave=nodes

-- membind=nodes

-- cpunodebind=nodes

-- physcpubind=cpus

-- localalloc

-- preferred=node

In short, you can specify in-memory local allocation, allocation in certain CPU nodes, or polling allocation. Unless it is set to-- interleave=nodes polling allocation, that is, memory can be allocated on any NUMA node. In other ways, even if there is memory remaining on other NUMA nodes, Linux does not allocate the remaining memory to the process, but uses SWAP to obtain memory. Experienced system administrators or DBA all know how lame database performance can be caused by SWAP.

So the easiest way is to turn off this feature.

The ways to turn off the feature are: from BIOS, the operating system, you can temporarily turn off this feature when you start the process.

A) due to the differences of various BIOS types, how to disable NUMA varies greatly, so we won't show you how to set it here.

B) turn it off in the operating system, and you can add numa=off directly at the end of the kernel line of / etc/grub.conf, as shown below:

Kernel / vmlinuz-2.6.32-220.el6.x86_64 ro root=/dev/mapper/VolGroup-root rd_NO_LUKS.UTF-8 rd_LVM_LV=VolGroup/root rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_LVM_LV=VolGroup/swap rhgb crashkernel=auto quiet KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM numa=off

In addition, you can set vm.zone_reclaim_mode=0 to reclaim memory as much as possible.

C) when starting MySQL, turn off the NUMA feature:

Numactl-interleave=all mysqld

Of course, the best way is to turn it off in BIOS.

Ii) Let's take a look at vm.swappiness again.

Vm.swappiness is a strategy used by the operating system to control physical memory swapping out. The value it allows is a percentage, with a minimum of 0 and a maximum of 100, which defaults to 60. Setting vm.swappiness to 0 means that as little swap,100 as possible means swapping out the memory pages of inactive as much as possible.

Specifically: when the memory is basically full, the system will use this parameter to determine whether to swap out the inactive memory that is rarely used in memory, or to release the cache of the data. Cache caches data read from disk, which may be read next according to the locality principle of the program; inactive memory, as its name implies, is memory that is mapped by the application but not used for a long time.

We can use vmstat to see the amount of memory in inactive:

# vmstat-an 1

Procs-memory--swap---io-----system---cpu-

R b swpd free inact active si so bi bo in cs us sy id wa st

100 27522384 326928 1704644 00 0 153 11 100 0 100 00

00 0 27523300 326936 1704164 00 0 74 784 590 00 100 00

00 0 27523656 326936 1704692 00 8 8 439 1686 00 100 00

00 0 27524300 326916 1703412 00 4 52 198 262 00 100 00

You can see more detailed information through / proc/meminfo:

# cat / proc/meminfo | grep-I inact

Inactive: 326972 kB

Inactive (anon): 248kB

Inactive (file): 326724 kB

Here we delve further into inactive inactive memory. In Linux, memory may be in three states: free,active and inactive. As we all know, Linux Kernel maintains a number of LRU lists internally to manage memory, such as LRU_INACTIVE_ANON, LRU_ACTIVE_ANON, LRU_INACTIVE_FILE, LRU_ACTIVE_FILE, LRU_UNEVICTABLE. LRU_INACTIVE_ANON and LRU_ACTIVE_ANON are used to manage anonymous pages, and LRU_INACTIVE_FILE and LRU_ACTIVE_FILE are used to manage page caches page cache. The kernel of the system will periodically move the active active memory to the inactive list according to the access of the memory page, and the inactive memory can be swapped into the swap.

Generally speaking, MySQL, especially InnoDB manages memory cache, it takes up a lot of memory, and there will be a lot of memory accessed infrequently. If these memories are mistakenly swapped out by Linux, it will waste a lot of CPU and IO resources. InnoDB manages the cache itself, and cache's file data takes up memory, which is of little benefit to InnoDB.

Therefore, we'd better set vm.swappiness=1 or 0 on the server of MySQL

The above is how to use NUMA technology. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.