Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of numa Architecture on linux

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you the relevant knowledge points of case analysis of numa architecture on linux. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look.

The following case is based on Ubuntu 16.04 and is also applicable to other Linux systems. The case environment I use is as follows:

Machine configuration: 32 CPU,64GB memory

Store the concept of hierarchy in NUMA:

1) processor layer: a single physical core, called the processor layer.

2) Local node layer: for all processors in a node, this node is called the local node.

3) home node layer: the node adjacent to the local node is called the home node.

4) remote node layer: a node that is not a local node or a neighbor node, called a remote node. The speed of CPU accessing the memory of different types of nodes is different. The speed of accessing the local node is the fastest, and the speed of accessing the remote node is the slowest, that is, the access speed is related to the distance of the node. The farther the distance is, the slower the access speed is. This distance is called Node Distance. Applications should minimize the interaction between different CPU modules, if the application can be fixed in a CPU module, then the performance of the application will be greatly improved.

* * talk about the composition of the cpu chip with the Kunpeng 920 processor: * * each super core cluster of the Kun Peng 920 processor-on-chip system consists of 6 kernel clusters, 2 Iamp O clusters and 4 DDR controllers. Each super kernel cluster is packaged into a CPU chip. Each chip integrates four 72-bit (64-bit data plus 8-bit ECC) high-speed DDR4 channels with the highest data transfer rate of 3200MT/s, and a single chip can support up to 512GB × 4 DDR storage space. L3 Cache is physically divided into two parts: L3 Cache TAG and L3 Cache DATA. L3 Cache TAG is integrated into each kernel cluster to reduce snooping latency. L3 Cache DATA is directly connected to the on-chip bus. Hydra root agent (Hydra Home Agent,HHA) is a module that deals with Cache conformance protocol in multi-chip system. POE_ICL is the hardware accelerator of system configuration, which can be used as packet sequencer, message queue, message distribution or specific tasks of a processor core. In addition, each super kernel cluster is physically configured with a generic interrupt Controller Distributor (GICD) module, which is compatible with ARM's GICv4 specification. When there are multiple super kernel clusters in a single-chip or multi-chip system, only one GICD is visible to the system software.

The use of numactl

Linux provides a manually tuned command numactl (not installed by default). The installation command on Ubuntu is as follows:

Sudo apt install numactl-y

First of all, you can understand the function of the parameter and the content of the output through man numactl or numactl-- h. View the numa status of the system:

Numactl-hardware

The result of running is as follows:

Available: 4 nodes (0-3) node 0 cpus: 0 123 45 6 7node 0 size: 16047 MBnode 0 free: 3937 MBnode 1 cpus: 8 9 10 11 12 13 14 15node 1 size: 16126 MBnode 1 free: 4554 MBnode 2 cpus: 16 17 18 19 20 21 22 23node 2 size: 16126 MBnode 2 free: 8403 MBnode 3 cpus: 24 25 26 27 28 29 30 31node 3 size: 16126 MBnode 3 free: 7774 MBnode distances:node 0 12 30: 10 20 20 1: 20 10 20 20 2 : 20 20 10 20 3: 20 20 20 10

According to the results of this figure and the command, you can see that the system has 4 node, each receiving 8 CPU and 16 GB of memory. It is also important to note that the L3 cache shared by CPU will also get the corresponding space on its own. You can view the numa status through the numastat command and return the content of the value:

Numa_hit: the number of times memory is intended to be allocated on this node and finally allocated from this node

Numa_miss: the number of times memory is intended to be allocated on this node but eventually allocated from other nodes

Numa_foreign: the number of times memory is intended to be allocated on other nodes, but eventually allocated from this node

Interleave_hit: the number of times the interleave policy was last allocated from this node

Local_node: the number of times processes on this node are allocated on this node

Other_node: is the number of times that other node processes are assigned on this node

Note: if the numa_miss value is found to be high, the allocation policy needs to be adjusted. For example, bind the specified process association to the specified CPU to improve the memory hit ratio.

Root@ubuntu:~# numastat node0 node1 node2 node3numa_hit 19480355292 11164752760 12401311900 12980472384numa_miss 5122680 122652623 88449951 7058numa_foreign 122652643 88449935 7055 5122679interleave_hit 12619 13942 14010 13924local_node 19480308881 11164721296 12401264089 12980411641other_node 5169091 122684087 88497762 67801NUMA memory allocation strategy

-- localalloc or-l: specifies that the process requests memory allocation from the local node. -- membind=nodes or-m nodes: specifies that the process can only request memory allocation from the specified nodes. -- preferred=node: specify a recommended node to acquire memory. If the acquisition fails, try another node. -- interleave=nodes or-I nodes: specifies that the process interweaves the memory allocation with the round robin algorithm from the specified nodes.

Numactl-- interleave=all mongod-f / etc/mongod.conf

Because the default memory allocation policy of NUMA is to give priority to the allocation of local memory in the CPU where the process is located, it will lead to uneven memory allocation among CPU nodes. When swap is enabled, a CPU node is out of memory, resulting in swap instead of allocating memory from remote nodes. This is called the swap insanity phenomenon. Or lead to a sharp decline in performance. Therefore, at the operation and maintenance level, we also need to pay attention to the memory usage under the NUMA architecture (the use of multiple memory nodes may be uneven), and configure the system parameters reasonably (memory recovery policy / Swap usage tendency) to avoid using Swap as far as possible.

Node- > Socket- > Core- > Processor

With the development of multi-core technology, multiple CPU are encapsulated together, this package is called slot Socket;Core is an independent hardware unit on socket; through intel's hyperthreading HT technology to further improve the processing power of CPU, the logical number of core Processor seen by OS.

Socket = Node

Socket is a physical concept, which refers to the CPU slot on the motherboard; Node is a logical concept, corresponding to Socket.

Core = physical CPU

Core is a physical concept, an independent hardware execution unit that corresponds to a physical CPU

Thread = logical CPU = Processor

Thread is a logical CPU, or Processo.

The use of lscpu

Display format:

Architecture: architecture

CPU (s): number of logical cpu

Thread (s) per core: each core thread, that is, hyperthreading

Core (s) per socket: cores per cpu slot / cores per physical cpu

CPU socket (s): number of cpu slots

L1D cache: level cache (google, which means cpu's L1 data cache)

L1I cache: first-level cache (specifically L1 instruction cache)

L2 cache: second-tier cache

L3 cache: three-tier cache

NUMA node0 CPU (s): the logical core on CPU, that is, hyperthreading

Execute lscpu, and the result section is as follows:

Root@ubuntu:~# lscpuArchitecture: x86_64CPU (s): 32Thread (s) per core: 1Core (s) per socket: 8Socket (s): 4L1d cache: 32KL1i cache: 32KL2 cache: 256KL3 cache: 20480KNUMA node0 CPU (s): 0-7NUMA node1 CPU (s): 8-15NUMA node2 CPU (s): 16-23NUMA node3 CPU (s): 24-31 are all the contents of the article "case Analysis of numa Architecture on linux" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report