The principle of SWAP in LINUX system programming and its relationship with MYSQL ORACLE 07/13 Update SLTechnology News&Howtos

The principle of SWAP in LINUX system programming and its relationship with MYSQL ORACLE

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

As a DBA, there have been some questions about SWAP usage for a long time, such as the following:

1. For the sake of my free, there is still room for buffer/cache. Why did SWAP use it?

2. Which process (s) uses the most SWAP?

3. If you avoid using SWAP in ORACLE or MYSQL?

4. Why use direct paths to bypass operating system buffers (O_DIRECT)?

5. ORACLE 11G uses tmpfs virtual file system, can memory be SWAP out?

6. What is the risk after setting / proc/sys/vm/swappiness to 0?

In this paper, with reference to the database, we hope to give

The answer is relatively clear, but because it involves more in-depth content, especially systematic knowledge, and the author does not have a detailed answer.

I have studied the LINUX kernel and implementation, but from the point of view of API callers and DBA, it is hard to avoid misconceptions.

If there is something wrong, please point out that we will discuss it together. If you are particularly interested, please refer to it yourself.

Similar books.

1. The relationship between LINUX high-speed page buffering and database BUFFER

To clarify these issues, we have to start with the LINUX page cache, which we all know in the database

A very large buffer is used to store recently accessed data, such as ORACLE's BUFFER_CACHE and INNODB

Innodb_buffer_pool_size, this area is critical and directly affects the possibility of physical reading of the database.

So the general database guidance is more than 60% of the system's physical memory (of course, ORACLE and share_pool

In general, ORACLE recommends that memory_target contain more than 60% of the system memory for SGA+PGA, while in LINUX OS's view

Data software is only an anonymous page space in user mode, since it is so important, so is LINUX OS, and he also

With such a part of the memory being kernel-state page cache, LINUX uses a rough allocation method, as long as

Data read (and pre-read-locality principle) it will be cached, it is only limited to two aspects

-- physical memory size

-- the amount of memory occupied by user mode space

(chapter 13 of LINUX system programming)

This is done because it greatly speeds up the call to READ (), WIRTE (), if the user-mode program does not have its own

When caching, the hot data is cached in the LINUX page cache, so the next time it is read will be much faster.

Then as far as the database is concerned, the data block may exist in:

DISK DATA PAGE-- > KERNAL BUFFER/CACHE PAGE--- > USER (DATABASE) BUFFER PAGE

In such an order, here is my own understanding, because I often allocate a memory area myself when I am programming.

(malloc family functions)

If we think about it a little bit at this time, we can see that the data page may exist in two places, one is KERNEL and the other is

USER (DATABASE) BUFFER, the former at the LINUX OS level and the latter at the DATABASE level. In LINUX

We can see it through FREE.

[root@testmy ~] # free

Total used free shared buffers cached

Mem: 4052856 2009000 2043856 0 146404 1327484

-/ + buffers/cache: 535112 3517744

Swap: 8388600 0 8388600

First line: memory usage from the perspective of OS

TOTAL: total memory size

USED: LINUX looks at the space used, user mode memory (such as database memory) + kernel mode high-speed buffer

FREE: remaining memory size 4052856-2009000

BUFFERS and CACHED: together explain the size of the kernel footprint

SHRED: shared memory size, which should exist in user mode

Second line: memory usage from the perspective of user mode

Used: user mode memory (such as database memory)

Free: the size that the application may be able to use, the first line of UESD-(buffers+cached)

In fact, if you want to look at the remaining memory space, both FREE have only partial reference value.

In addition, if you want to talk about the difference between buffers and cache, it has something to do with going deep into the LINUX kernel P607.

In fact, there is no difference after kernel 2.4.10, so describe it.

For more information about kernel state and user state, please see:

Http://blog.itpub.net/7728585/viewspace-2129073/

Part of the point of view and matching pictures

2. Why the database can use O_DIRECT open file mode (linux open () api flag)

-- Database software has its own method of managing memory blocks and does not need kernel high-speed buffers, so memory is wasted.

The risk of space. As above:

DISK DATA PAGE-- > KERNAL BUFFER/CACHE PAGE--- > USER (DATABASE) BUFFER PAGE

We removed > KERNAL BUFFER/CACHE PAGE and went directly from DIS DATA PAGE-- > USER BUFFER PAGE

The database software has its own method of managing memory blocks and does not need kernel-state high-speed buffers, which makes READ ()

The WRITE () operation has the extra overhead of KERNAL BUFFER/CACHE PAGE.

You can refer to this point.

Http://blog.itpub.net/7728585/viewspace-2129558/

To understand some of the viewpoints and drawings of

3. Collection of pages with insufficient memory

We know that LINUX memory allocation uses the partner algorithm. Under the crude memory allocation method of LINUX, generally speaking,

In this case, the physical memory will be completely mapped sooner or later. At this time, a system or algorithm is needed to carry out the page.

Recycling, this is LINUX's PFRA algorithm

For information about which pages can be recycled, take a look at Chapter 17 of the LINUX kernel:

Swappable Anonymous pages in User Mode address spaces

Mapped pages of tmpfs filesystem (e.g.dural pages of IPC sharedmemory)

Syncable Mapped pages in User Mode address spaces Pages included in the

Page cache and containing data of disk files Block device buffer

Pages Pages of some disk caches (e.g., the inode cache)

Discardable Unused pages included in memory caches (e.g., slab allocator caches)

Unused pages of the dentry cache

In fact, here we obviously see an anonymous page in key user mode, which contains the process heap and stack space.

Then the memory allocated by the database itself is obviously heap memory, which can be swap, while the kernel

Pages that are too cached are generally synchronizable and discarded.

Next, the principle of freeing memory:

1. First release kernel cache pages that are not used by any process

2. All pages of the user-mode process can be recycled and exchanged

3. Recycle shared pages

4. Reclaim the pages in the kernel cache according to the LRU algorithm, and replace the clean pages first, because there is no need to

Write to disk.

It is also revealed that a virtual file system using tmpfs in ORACLE 11G will also be SWAP to swap partitions.

4. Swap and exchange tendency

The swap partition is used to expand the actual memory size, but it exists on the physical disk, and its speed can be imagined.

It has also become a very concerned part of DBA, because once you use SWAP, then the performance of the database can be imagined.

We have described earlier that user-mode memory space is interchangeable, including the tmpfs virtual file system.

On the contrary, the kernel cache page will not enter the SWAP space, it is only in the case of insufficient memory.

Synchronize and recycle.

So let's talk about the swapping tendency (deep into the LINUX kernel P688):

Exchange tendency = mapping ratio / 2 + load value + exchange value

Mapping ratio: the proportion of the total physical memory occupied by the actual physical memory in user mode memory

It's obviously:

-/ + buffers/cache: 535112 3517744 here 535112 Universe 4052856

If the proportion is large, it means that the user mode memory takes up more physical memory, and if it is relatively small, it means the kernel state cache page.

It takes up more.

Obviously here / 2, even if it is 50% at most, that is, 50 is certainly impossible.

-- load value: refer to the LINUX kernel for details. Generally, it is impossible to be 50. Knowing this is enough to make inferences.

Yes.

-- Exchange value (SWAPPINESS): it can be set by the user. The default is 60Die Sysctl-a | grep swappiness can see it.

If the swap tendency is greater than or equal to 100, when the memory is tight, the page will be swapped out from the user-mode memory.

Then release, if the exchange value (SWAPPINESS) is set to 0, then the mapping ratio / 2 + load value cannot be greater than or equal to

100, which makes it impossible to swap out user-mode memory. For databases, of course, there are all kinds of buffer_cache, which is

It's good for the database.

How to modify:

-- permanent modification

Modify / etc/sysctl.conf plus:

Vm.swappiness=0

Sysctl-p takes effect

-- temporary modification

Echo 0 > / proc/sys/vm/swappiness

We have to take a look here, too.

[root@testmy ~] # free

Total used free shared buffers cached

Mem: 4052856 2009000 2043856 0 146404 1327484

-/ + buffers/cache: 535112 3517744

The two free values here, as mentioned earlier, the first free is viewed by OS, and the second is viewed by the app app. In fact,

They have only reference value and no absolute meaning, because the actual memory free should be 2043856, but this value

In the general LINUX operating system that installs database applications, it is 0, the second column 3517744, which contains buffer+cached

Kernel-state cache pages, although kernel-state cache pages can be recovered and synchronized, due to the existence of exchange tendency

And not all kernel-state high-speed buffer pages can be recycled and synchronized, and LINUX operations with database applications installed

Generally speaking, this column value is very large in the system, but there are occasional cases where SWAP is used. In fact, using SWAP if not

If there is a swap in and out of SWAP, the database will not be slow.

5. OOM (out of memory)

If the swap area is full and there is no kernel state cache to reclaim and synchronize, then OOM is enabled

To delete the process and enter the LINUX kernel is a metaphor for a surgeon to amputate a patient, although not

The situation, but there is no turning back.

So if we set swappiness=0, then the swap area will not be full, because user-mode memory such as data

Using up memory, you can't use the swap area, so it feels like swapping is useless at this time. This is a potential risk, but

For database software, the frequent use of SWAP swap in and out is no different from the down machine. I have encountered it.

ORACLE (10GR2) databases that use a large number of SWAP swap areas to swap out swapped pages are basically immobile. The waiting event is

Shared pool, libarycache, and finally use giant pages to solve.

I also found such an article on ORACLE METALINK.

The confusion about vm.swappiness comes from the fact that in older Red Hat kernels

A value of 0 for vm.swappiness resulted in the minimal amount of swapping to avoid

An out of memory condition. In newer kernels (as of RHEL kernel 2.6.32-303), a value

Of 0 will completely disable swap, but a value of 1 will provide the minimal amount

Of swapping to avoid an out of memory condition.

My general recommendation is to set vm.swappiness to a value of 1 for a dedicated

MySQL server. There is not much practical difference between a value of 0 and a

Value of 1 on older kernels, but 1 is the safe setting to avoid an OOM condition

On newer kernels (assuming that some swap space is available, of course).

We can see that setting to swappiness=0 does increase the risk of OOM, which recommends minimizing swap to 1

6. Exchange settings of the database server

-- setting O_DIRECT, relying on the memory of the database itself, reducing the redundancy and waste of memory of data in memory, and reducing the possibility of using SWAP

MYSQL applies innodb_flush_method = O_DIRECT

ORACLE is careful to use filesystemio_options=SETALL,DIRECTIO. There may be BUG. Be sure to check it before using it.

Metalink, see if there is bug in the version

In practice, ORACLE 11G uses many full table scans of large tables.

The Direct Path Read approach reduces the impact on the database itself and kernel cache pages

Set vm.swappiness=0 (or 1), according to the exchange tendency so that the user-mode memory is not swapped, if so, the author thinks that if not set

O_DIRECT should ensure that there is enough memory, then the database memory had better not exceed 50% of the physical memory (please give us some advice)

Avoid OOM problems.

MYSQL is applicable

ORACLE query METALINK has no relevant instructions, and can be used in principle

-- set the giant page

ORACLE applies. Note that there is a conflict between 11gMEMORY_TARGET 's use of tmpfs and giant pages. I set it in ORACLE10G.

MYSQL can be used, but has not been used

-- whether other memory-related parameters except buffer are too large

Mysql:sort_buffer tmp_table_size join_buffer_size et al.

Oracle: is it appropriate to separate SGA from PGA,PGA

-- the simplest colleagues, buy more physical memory, physical memory is not expensive.

Finally, let's answer the initial question.

1. For the sake of my free, there is still room for buffer/cache. Why did SWAP use it?

Reference section 4

2. Which process (s) uses the most SWAP?

After top, press F to select p, then enter will show SWAP for query.

/ proc/pid/stats contains swap usage, and you can use the program to traverse sorting

3. If you avoid using SWAP in ORACLE or MYSQL?

Reference section 6

4. Why use direct paths to bypass operating system buffers (O_DIRECT)?

Reference section 2

5. ORACLE 11G uses tmpfs virtual file system, can memory be SWAP out?

Reference section 3

6. What is the risk after setting / proc/sys/vm/swappiness to 0?

Reference section 5

Reference:

15 16 17

thirteen

ORACLE METALINK

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.