[cloud and Enmo] performance optimization: rational configuration of large memory pages (HugePage) in Linux environment 07/19 Update SLTechnology News&Howtos

[cloud and Enmo] performance optimization: rational configuration of large memory pages (HugePage) in Linux environment

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original 2016-09-12 Xiong Jun

[cloud and Enmo] performance optimization: rational configuration of large memory pages (HugePage) in Linux environment

Xiong Jun (Lao Xiong)

General Manager of Yunhe Enmis District

Core members of Oracle ACED,ACOUG

With the development of PC Server, great progress has been made in performance. 64-bit CPU has entered the ordinary household PC a few years ago, not to mention the higher-end PC Server;. With the efforts of the two processor giants Intel and AMD, x86 CPU continues to improve its processing power; at the same time, with the development of manufacturing technology, the memory capacity that can be installed on PC Server is also increasing, and PC Server with tens of gigabytes of memory can be seen everywhere. It is the development of hardware that makes the processing power of PC Server more and more powerful and its performance higher and higher. In terms of stability, with PC Server and Linux operating systems, it can also meet the stability and reliability required by important business systems. Of course, in terms of cost, to quote a netizen from a software manufacturer in the industry, "if we don't use PC Server and switch to minicomputers, what kind of money do we make?" . Regardless of the initial purchase, runtime energy consumption and maintenance costs, the PC Server is much cheaper than minicomputers with the same processing capacity. It is under the influence of two important factors, performance and cost, more and more databases are running on PC Server. Some customers served by the author even virtualize the high-end PCServer into multiple machines and run a set of Oracle databases on each virtual machine. These databases carry important production systems.

There is no doubt that the most suitable operating system for running Oracle databases on PC Server is Linux. As an operating system very similar to UNIX, it has the same excellent performance as UNIX in terms of stability, reliability and performance. However, Linux has an obvious defect in memory paging mechanism compared with AIX, HP-UX and other operating systems, and this defect is particularly obvious in Oracle databases using larger SGA. In serious cases, it has a significant negative impact on database performance, and even causes the database to stop responding completely. This article will elaborate on this defect from a case, and use the large memory page under Linux to solve this problem.

The introduction of cases

There are serious performance problems in one of the customer's systems. When the problem occurs, the system is basically unavailable, and all business operations on the application are completely unresponsive. The database of the system is Oracle 10.2.0.4 Oracle Database,CPU running under RHEL 5.2 (Red Hat Enterprise Linux Server release 5 (Tikanga)) is 4 4-core extremely strong processors (Intel (R) Xeon (R) CPU E7430 @ 2.13GHz), that is, logical CPU is 16, memory 32GB. During the failure, the CPU of the database server remained at 100% for a long time. Even after shutting down all the Weblogic Server of the application, the CPU utilization of the database server has been 100% for a few minutes, and then decreases gradually, and it will take about 20 minutes to fall to the normal idle state, because at this time, all applications have been closed, and only a very low CPU utilization is normal. According to the database maintenance staff of this system, this situation has occurred many times, even after restarting the database, such a failure will occur in less than a day or two. At the same time, the system has not changed much recently.

After receiving the fault report, the author is very slow to connect to the database through SSH, and it takes about 1 minute to connect. First of all, a simple look at the performance of the server, the development of IO is very low, there is still a lot of memory surplus, at least more than 1GB, there is no page in / page out. The most obvious phenomenon is that the utilization rate of CPU is quite high, which has been maintained at 100%, while the SYS part of the utilization rate of CPU is more than 95%. And the operating system running queue has been above 200. The server memory usage is as follows:

From a phenomenal point of view, SYS CPU height is an important clue to analyze the problem.

After learning about the performance at the operating system level as quickly as possible, immediately connect to the database through Sqlplus to view the performance information inside the database: (note: the following data about SQL, server name, database name and other related information have been processed. )

... In order to save space, omit part of the content.

Call waitevent to view the wait event

Judging from the activity and wait events in the database, there is not much exception. It is worth noting that when the CPU utilization of the database server is 100% for a long time, or when the physical memory is exhausted and accompanied by a large number of swapping memory swapping out, you need to carefully diagnose the performance phenomena in the database, such as some kind of more waiting events, is it the result of CPU or insufficient memory, or is it the Root Cause that causes excessive CPU or memory depletion in these databases.

Judging from the above data, there are not very many active sessions, less than 50, and the number of background processes is quite different from the 200 runs in the operating system. There are mainly three types of non-idle wait events in the database, IO-related wait events such as db file sequential read,database link-related SQL*Net more data from dblink and latch-related wait events. In these three categories, usually only wait events such as latch cause an increase in CPU utilization.

By analyzing and comparing the AWR report, there is no particularly significant difference in database activity between the failure period and the normal period. But in terms of system statistics, there is a big difference:

The above data is a comparison of AWR with 1 hour of failure period (1st) and 1 hour of normal time period (2nd). For fault analysis, especially when the fault time is short, the one-hour AWR report will not accurately reflect the performance during the fault. But when we are in Trouble Shooting, the first thing is to determine the direction from all kinds of data. As mentioned earlier, the high CPU utilization in the SYS section is an important clue, and when other performance data within the database is not much the same, you can start with CPU.

Analysis of CPU usage in operating system

So, what do the two different utilization rates of SYS and USER represent in the operating system? Or what's the difference between the two?

To put it simply, the SYS part of CPU utilization refers to the CPU part used by the operating system kernel (Kernel), that is, the CPU consumed by running code in kernel state, and the most common is the CPU consumed by system calls (SYS CALL). The USER part is the CPU part used by the application's own code, that is, the CPU consumed by the code running in the user mode. For example, when Oracle executes SQL, it needs to initiate a read call when reading data from disk to db buffer cache. This read call is mainly run by the code of the operating system kernel, including the device driver, so the consumed CPU is calculated to the SYS part; while when Oracle parses the data read from the disk, only Oracle's own code is running, so the consumed CPU is calculated to the USER part.

So which operations or system calls will mainly generate the CPU in the SYS part:

1. Icano operations, such as reading and writing files, accessing peripherals, transferring data over the network, etc. This part of the operation generally does not consume too much CPU, as the main time consumption will be on the device operated by the IO. For example, when reading files from disk, the main time is on the internal operation of the disk, while the CPU time consumed is only a small part of the response time of the IWeiO operation. It is only possible to increase SYS CPU when the concurrency Ibank O is too high.

two。 Memory management, such as application processes applying for memory from the operating system, operating system maintenance of available memory, swap space page swapping, etc. In fact, similar to Oracle, the larger the memory and the more frequent memory management operations, the higher the CPU consumption.

3. Process scheduling. The use of this part of CPU lies in the length of the running queue in the operating system, the longer the running queue, indicating that the more processes need to be scheduled, the higher the burden on the kernel.

4. Others include inter-process communication, semaphore processing, some activities within the device driver, and so on.

From the performance data of system failure, memory management and process scheduling may be the reasons for the high SYS CPU. However, the high running queue is probably due to the high CPU utilization rather than the high CPU utilization caused by the high running queue. From the database, the number of active sessions is not very high. So next, you need to pay attention to whether the CPU utilization is too high due to problems with system memory management.

Reviewing the system memory data collected at the beginning of this article in / proc/meminfo, you can find an important piece of data:

PageTables: 4749076 kB

As you can see from the data, PageTables memory reaches 4637MB. PageTables literally means "page table". To put it simply, it is a table that the operating system kernel uses to maintain the correspondence between the linear virtual address and the actual physical memory address of the process.

Modern computers usually manage and allocate physical memory in Page Frame units. On x86 processor architecture, the page size is 4K. For processes running on the operating system, the accessible address space is called the virtual address space, which is related to the number of processor bits. For 32-bit x86 processors, the accessible address space of the process is 4GB. Each process running in the operating system has its own independent virtual address space or linear address space, and this address space is also managed by Page, which is usually 4KB in Linux. When a process accesses memory, it is responsible for converting the virtual address of the process into a physical address with the cooperation of the operating system and hardware. Two different processes may have the same virtual linear address and point to the same physical memory, such as shared memory, or different, such as the private memory of the process.

The following is a schematic diagram of the correspondence between virtual addresses and physical memory:

Suppose there are two processes An and B, each with a memory pointer pointing to 0x12345 (0x represents a hexadecimal number), such as one process fork or clone out of another process, then the two processes will have pointers to the same memory address. When a process accesses the memory pointed to by the 0x12345 address, the operating system translates this address into a physical address, for example, process An is 0x23456 and process B is 0x34567, which does not affect each other. So when did you get this physical address? For process private memory, which is the case in most cases, it is obtained by the process when it requests an allocation of memory from the operating system. When a process requests for memory allocation from the operating system, the operating system allocates free physical memory to the process in Page units, and generates a virtual thread address for the process to establish a mapping relationship between the virtual address and the physical memory address, which is returned to the process as a result.

Page Table (page table) is the data structure used by the operating system to maintain the correspondence between the virtual address and physical memory of the process. The following figure is a schematic diagram of Page Table in a relatively simple case:

The following briefly describes how the operating system translates between the virtual address and the actual physical address of a process when the page size is 4K on a 32-bit system.

1. A catalog table is a data structure used to index a page table. Each catalog entry occupies 32 bits, that is, 4 bytes, and stores the location of a page table. The catalog table takes up exactly 1 page of memory, or 4KB, and can store 1024 catalog entries, that is, the location where 1024 page tables can be stored.

two。 The page table entry (Page Table Entry) is 4 bytes in size and stores a physical memory page start address. Each page table also takes up 4K of memory and can store 1024 physical memory page start addresses. Because the starting address of the physical memory page is aligned in 4KB, only 20 bits of the 32 bits are needed to represent the address, and the other 12 bits are used for other purposes, such as indicating whether the 1 memory page is read-only or writable, and so on.

3. 1024 page tables, each page table has 1024 physical memory page start addresses, with a total of 1m addresses. Each address points to a physical memory page size of 4KB, totaling 4GB.

4. When the operating system and hardware map the virtual address to a physical address, the 10 bits 31-22 of the virtual address are used to index from the directory entry to one of the 1024 page tables; the 10 bits of the virtual address 12-21 are used to index from the page table to one of the 1024 page table items. Get the starting address of the physical memory page from the page table entry to this index, and then use the 12 bits of the virtual address 0-11 as the offset in the 4KB memory page. Then the physical memory page start address plus offset is the address of the physical memory that the process needs to access.

Then look at how much space will be occupied by the two data structures, the catalog table and the page table. Catalog tables are fixed only by 4KB. And the page table? Because there are a maximum of 1024 page tables, each page table consumes 4KB, so page tables consume up to 4MB memory. In fact, processes in 32-bit Linux usually don't have such a large page table. It is impossible for a process to use up all the 4GB-sized address space, and there is even a 1GB virtual address space allocated to the kernel. At the same time, Linux will not set up such a large page table for the process at one time, and the operating system will establish the corresponding address mapping for the process only when the process is allocating and accessing memory.

Only the simplest paging mapping is described here. In fact, the page table directory and the page table have a total of four levels. At the same time, when PAE or 64-bit system is enabled under 32-bit, the page table structure is more complex than the above diagram. But in any case, the structure of the last level, the page table, is consistent. In a 64-bit system, the size of a page table entry in a Page Table (page table) changes from 32-bit to 64-bit compared to 32-bit. So how much impact will this have? If a process accesses physical memory with 1GB, that is, 262144 pages of memory, in a 32-bit system, the page table requires 262144*4/1024/1024=1MB, while in a 64-bit system, the page table doubles the space occupied, which is 2MB. Then take a look at what happens to the Oracle database running in the Linux system. The SGA size 12GB of the database in this case, if an OracleProcess accesses all the SGA memory, its page table size will be 24MB, which is an astonishing number. PGA is ignored here because the average PGA per process is no more than 2m, which is too small compared to SGA. According to the AWR report, there are about 300 sessions, so the page table of these 300 connections will reach 7200MB, but not every process will access all the memory in SGA. From meminfo, we can see that the size of Page Tables reaches 4637MB. Such a large Page Table space is exactly the result of 300 sessions, and the size of SGA reaches the result of 12GB.

It is clear that Page Table is not the only memory-managed data structure in the system, but there are other data structures for managing memory. These excessive memory management structures will undoubtedly greatly increase the burden of the operating system kernel and the consumption of CPU. However, large changes in memory requirements caused by load changes or other reasons, such as multiple processes requesting a large amount of memory at the same time, may cause CPU to peak in a short period of time, thus causing problems.

Use large memory pages to solve the problem

Although there is no hard evidence, and there is not enough time to gather enough evidence to prove that a large Page Table is causing the problem, it requires more than half an hour of system unavailability failures. But for now, this is the biggest suspicion. Therefore, it was decided to use large memory pages first to tune the memory usage of the system.

Large memory pages are collectively referred to as Large Page in lower versions of Linux and Huge Page in current mainstream Linux versions. Let's take Huge Page as an example to illustrate the advantages of Huge Page and how to use it.

What are the benefits of using large memory pages:

1. Reduce the page table (Page Table) size. Each Huge Page corresponds to continuous 2MB physical memory, so the physical memory of 12GB only needs the Page Table of 48KB, which is much less than the original 24MB.

2. Huge Page memory can only be locked in physical memory and cannot be swapped to the swap area. This avoids the performance impact caused by switching.

3. Due to the reduction of the number of page tables, the hit rate of TLB in CPU (which can be understood as CACHE of CPU to page tables) is greatly improved.

4. The page table for Huge Page can be shared among processes, and the size of Page Table is reduced. In fact, this can reflect the shortcomings of Linux in the paging processing mechanism. Other operating systems, such as AIX, avoid this problem of Linux by sharing the same page table for memory such as shared memory segments. For example, in a system maintained by the author, the number of connections is usually more than 5000, and the SGA of the instance is about 60GB. If you press the Linux paging method, most of the memory in the system will be used by the page table.

So, how do you enable large memory pages (Huge Page) for Oracle? The following are the implementation steps. Since the database involved in the case adjusts the SGA to 18G after a period of time, take 18G as an example:

1. Check / proc/meminfo to confirm that the system supports HugePage:

HugePages Total represents the number of large memory pages configured in the system. HugePages Free indicates the number of large memory pages that have not been accessed, where free is easily misunderstood, which will be explained later. HugePages Rsvd indicates the number of pages that have been allocated but not yet used. Hugepagesize represents the large memory page size, here is 2MB, note that in some kernel configurations it may be 4MB.

For example, the total amount of HugePages is 11GB, which is 10GB, which is 10GB, and the target is 8GB. After the database is started, the HugePage memory is allocated according to SGA_MAX_SIZE. Here is 10GB. The HugePage memory of the real Free is 11-10 gigabytes. But SGA_TARGET only has 8GB, so there will be 2GB that will not be accessed, then HugePage_Free is 2GB, 1GB, 3GB, and there is 2GB in Rsvd. The only thing that can actually be used for other instances here is 1GB, that is, the real Free is only 1GB.

1. The number of memory pages scheduled to be set. So far, large memory pages can only be used for a small number of types of memory such as shared memory segments. Once physical memory is used as large memory pages, it cannot be used for other purposes, such as private memory for processes. Therefore, too much memory cannot be set to large memory pages. We usually use large memory pages as the SGA of the Oracle database, so the number of large memory pages:

HugePages_Total=ceil (SGA_MAX_SIZE/Hugepagesize) + N

For example, if the SGA_MAX_SIZE set for the database is 18GB, then the number of pages can be ceil (18 pages 1024 Universe 2) + 2 pages 9218. By adding N here, you need to set the HugePage memory space to be slightly larger than SGA_MAX_SIZE, usually 1-2. Looking at the size of the shared memory segment through the ipcs-m command, we can see that the size of the shared memory segment is actually about larger than that of SGA_MAX_SIZE. If you have multiple Oracle instances on the server, you need to consider the extra shared memory segment for each instance, that is, the higher the N value. In addition, Oracle databases either use large memory pages or do not use large memory pages at all, so inappropriate HugePages_Total will cause a waste of memory.

In addition to using SGA_MAX_SIZE calculation, a more accurate HugePages_Total can also be calculated from the shared memory segment size obtained by ipcs-m.

HugePages_Total=sum (ceil (share_segment_size/Hugepagesize))

two。 Modify the / etc/sysctl.conf file by adding the following line:

Vm.nr_hugepages=9218

Then execute the sysctl-p command to make the configuration effective.

Here the parameter vm.nr_hugepages is the number of large memory pages calculated in step 2. Then check / proc/meminfo, and if the HugePages_Total is less than the set number, it indicates that there is not enough contiguous physical memory for these large memory pages and the server needs to be restarted.

3. Add the following line to the / etc/security/limits.conf file:

Oracle soft memlock 18878464

Oracle hard memlock 18878464

Here you can set the amount of memory that oracle users can lock, in KB.

Then reconnect to the database server as an oracle user, using the ulimit-a command, you can see:

Max lockedmemory (kbytes,-l) 18878464

It is also possible to configure memlock as unlimited here.

4. If the database uses MANUAL to manage SGA, it needs to be changed to AUTO, that is, SGA_TARGET_SIZE is set to a value greater than 0. For 11g, because HugePage can only be used for shared memory and cannot be used for PGA, AMM cannot be used, that is, MEMORY_TARGET cannot be set to greater than 0, and SGA and PGA,SGA can only be managed in AUTO mode.

5. Finally, start the database and check / proc/meminfo to see if HugePages_Free has been reduced. If it has been reduced, HugePage Memory has been used. However, when I looked at the / proc/meminfo on the failed database server, I found that there was no information related to HugePage, and sysctl-a looked at all the system parameters and did not find the parameter vm.nr_hugepages. This is due to the fact that the Linux kernel is not compiled into HugePage. We need to use another kernel to enable HugePage.

View / boot/grub/grub.conf:

It is found that the kernel used by this system has the word "xen". We modify this file to change default=0 to default=2, or mask the first two kernels with #, then restart the database server and find that the new kernel already supports HugePage.

When large memory pages are enabled in the database, the performance problems described in this article do not occur even when the SGA is increased. Looking at / proc/meminfo data, the memory consumed by PageTables has been kept below 120m, reducing 4500MB compared to the original. It is observed that the utilization of CPU is also lower than that before using HugePages, and the operation of the system is quite stable, at least there is no BUG caused by the use of HugePage.

The test shows that for the OLTP system, when HugePage is enabled on the Linux running the Oracle database, the database processing capacity and response time are improved in varying degrees, up to more than 10%.

Summary

This paper introduces the role of large memory pages in performance improvement under Linux operating system and how to set corresponding parameters to enable large memory pages with a case study. At the end of this paper, the author suggests that when running the Oracle database in the Linux operating system, enable large memory pages to avoid the performance problems encountered in this case, or further improve system performance. It can be said that HugePage is one of the few features that can improve performance without extra cost. It is also gratifying to note that the new version of the Linux kernel provides Transparent Huge Pages so that applications running on Linux can use large memory pages more widely and easily, not just shared memory. Let's wait and see the changes caused by this feature.

-- the end

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.