In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Linux kernel parameters min_free_kbytes and lowmem_reserve_ratio, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
1. Min_free_kbytes
Let's first take a look at the official explanation:
This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK _ MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on
Its size.
Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads.
Setting this too high will OOM your machine instantly.
The explanation is very clear, and there are the following key points:
1. Represents the minimum amount of free memory reserved by the system.
A default value is calculated based on the memory size when the system is initialized, and the calculation rule is:
Min_free_kbytes = sqrt (lowmem_kbytes * 16) = 4 * sqrt (lowmem_kbytes) (Note: lowmem_kbytes can be considered as the system memory size)
In addition, the calculated value has a minimum maximum limit of 128K and a maximum of 64m.
As you can see, min_free_kbytes does not grow linearly as memory increases, and the reason "because network bandwidth does not increase linearly with machine size" is mentioned in comments. With the increase of memory, it is unnecessary and linear to reserve too much memory, which is enough to ensure the usage in times of emergency.
The main purpose of 2.min_free_kbytes is to calculate the three parameters watermark [min/low/high] that affect memory recovery.
1) watermark [high] > watermark [low] > watermark [min], each zone has one set
2) when the free memory of the system is lower than watermark [low], start the kernel thread kswapd to reclaim the memory (one for each zone) until the amount of free memory in the zone reaches watermark [high]. If the upper layer applies for memory too fast, causing the free memory to drop to watermark [min], the kernel will carry out direct reclaim (direct recycling), that is, directly reclaim in the process context of the application, and then use the recycled free pages to satisfy the memory request, so it will actually block the application, cause a certain response delay, and may trigger the system OOM. This is because the memory below watermark [min] belongs to the system's own memory and is used for special use, so it will not be used for general applications in user mode.
3) the calculation method of the three watermark:
Watermark [min] = min_free_kbytes can be converted to page units, assuming min_free_pages. (because each zone has a set of watermark parameters, the actual calculation effect is based on the proportion of each zone size to the total memory size, and the calculated per zone min_free_pages) watermark [low] = watermark [min] * 5 / 4 watermark [high] = watermark [min] * 3 / 2
So the amount of buffer in the middle is high-low = low-min = per_zone_min_free_pages * 1 + 4. Because min_free_kbytes = 4 * sqrt (lowmem_kbytes), you can also see that the amount of buffer in the middle is also square with the growth rate of memory.
4) you can view the watermark of each zone through / proc/zoneinfo
For example:
Node 0, zone DMApages free 3960 min 65 low 81 high 97
Influence of 3.min_free_kbytes size
The higher the setting of min_free_kbytes, the higher the line of watermark, and the amount of buffer between the three lines will increase accordingly. This means that kswapd will be started earlier for recycling, and more memory will be reclaimed (until watermark [high] will stop), which will cause the system to reserve too much free memory, thus reducing the amount of memory available to the application to some extent. In extreme cases, when min_free_kbytes is set to close to memory, there is too little memory left for the application and may cause OOM to occur frequently.
If the min_free_kbytes setting is too small, the reserved memory of the system will be too small. In the process of kswapd recycling, there will also be a small amount of memory allocation behavior (PF_MEMALLOC will be set), which allows kswapd to use reserved memory; another situation is that the process selected and killed by OOM can also use the reserved part if it needs to apply for memory. In both cases, letting them use reserved memory can prevent the system from entering the deadlock state.
2. Lowmem_reserve_ratio
Official explanation:
For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone. This is because that memory could then be pinned via the mlock () system call, or by unavailability of swapspace.
And on large highmem machines this lack of reclaimable lowmem memory can be fatal.
So the Linux page allocator has a mechanism which prevents allocations which _ could_ use highmem from using too much lowmem. This means that a certain amount of lowmem is defended from the possibility of being captured into pinned user memory.
The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones.
If you have a machine which uses highmem or ISA DMA and your applications are using mlock (), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting.
1. Action
In addition to the fact that min_free_kbytes reserves some memory on each zone, lowmem_reserve_ratio makes a certain defense reservation between each zone, mainly to prevent high-end zone from overusing the memory resources of low-end zone when there is no memory.
For example, a common node machine now has three zone: DMA,DMA32 and NORMAL. DMA and DMA32 belong to low-end zone, and the memory is also relatively small. For example, the sum of the two zone of a machine with 96G memory is only about 1G, while NORMAL belongs to high-end memory (currently there is generally no HIGH zone), and the quantity is larger (> 90g). Low-end memory has some special functions, for example, when DMA occurs, only the low-end memory of DMA zone can be allocated, so it is necessary to use high-end memory instead of low-end memory as far as possible, and prevent the low-end memory from being preempted when the high-end memory is underallocated.
two。 Calculation method.
Cat / proc/sys/vm/lowmem_reserve_ratio256 256 32
The kernel uses the above protection array to calculate the amount of reserved page for each zone, which is also in the form of an array, which can be seen in / proc/zoneinfo:
Node 0, zone DMA pages free 1355 min 3 low 3 high 4:: numa_other 0 protection: (0, 2004, 2004, 2004) ^ pagesets cpu: 0 pcp: 0:
When allocating memory, these reserved page values are combined with watermark to determine whether the allocation request is now satisfied or whether it is considered that the amount of free memory is too low to start recycling.
For example, if a page request for a normal area (index = 2) attempts to allocate memory in the DMA area, and the criterion now used is watermark [low], the kernel calculates page_free = 1355 and watermark + protection [2] = 3 + 2004 = 2007 > page_free, then it is considered that there is too little free memory to allocate it. If the allocation request is already from DMA zone, then protection [0] = 0 is used and the allocation request is satisfied.
The protection [j] calculation rules for zone [I] are as follows:
(I)
< j): zone[i]->Protection [j] = (total sums of present_pages from zone [iTun1] to zone [j] on the node) / lowmem_reserve_ratio [I]; (I = j): (should not be protected. = 0; (I > j): (not necessary, but looks 0)
The default lowmem_reserve_ ratio [I] value is:
256 (if zone [I] means DMA or DMA32 zone) 32 (others).
As can be seen from the above calculation rules, the reserved memory value is the reciprocal relationship of ratio. If it is 256, it represents 1MAC 256, that is, 0.39% of the high-end zone memory size.
If you want to reserve more pages, you should set a smaller value, with a minimum value of 1 (1 move 1-> 100%).
3. Example of matching with min_free_kbytes (watermark)
The following is a log printed out when an online server (96g) failed to apply for memory:
[38905.295014] java: page allocation failure. Order:1, mode:0x20, zone 2 [38905.295020] Pid: 25174 Comm: java Not tainted 2.6.32-220.23.1.tb750.el5.x86_64 # 1... [38905.295348] active_anon:5730961 inactive_anon:216708 isolated_anon:0 [38905.295349] active_file:2251981 inactive_file:15562505 isolated_file:0 [38905.295350] unevictable:1256 dirty:790255 writeback:0 unstable:0 [38905.295351] free:113095 slab_reclaimable:577285 slab_unreclaimable:31941 [38905.295352] mapped:7816 shmem:4 pagetables : 13911 bounce:0 [38905.295355] Node 0 DMA free:15796kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated (anon): 0kB isolated (file): 0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? Yes [38905.295365] lowmem_reserve []: 0 1951 96891 96891 [38905.295369] Node 0 DMA32 free:380032kB min:800kB low:1000kB high:1200kB active_anon:46056kB inactive_anon:10876kB active_file:15968kB inactive_file:129772kB unevictable:0kB isolated (anon): 0kB isolated (file): 0kB present:1998016kB mlocked:0kB dirty:20416kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:11716kB slab_unreclaimable:160kB kernel_stack:176kB pagetables:112kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:576 all_unreclaimable? No [38905.295379] lowmem_reserve []: 0 94940 94940 [38905.295383] Node 0 Normal free:56552kB min:39032kB low:48788kB high:58548kB active_anon:22877788kB inactive_anon:855956kB active_file:8991956kB inactive_file:62120248kB unevictable:5024kB isolated (anon): 0kB isolated (file): 0kB present:97218560kB mlocked:5024kB dirty:3140604kB writeback:0kB mapped:31264kB shmem:16kB slab_reclaimable:2297424kB slab_unreclaimable:127604kB kernel_stack:12528kB pagetables:55532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? No [38905.295393] lowmem_reserve []: 00 00 [38905.295396] Node 0 DMA: 1*4kB 2*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15796kB [38905.295405] Node 0 DMA32: 130*4kB 65*8kB 75*16kB 72*32kB 95*64kB 22*128kB 10*256kB 7*512kB 4*1024kB 2*2048kB 86*4096kB = 380032kB [38905.295414] Node 0 Normal: 12544*4kB 68*8kB 0 * 16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54816kB [38905.295423] 17816926 total pagecache pages
1) from the first line log "order:1, mode:0x20", you can see that it is an application of GFP_ATOMIC type, and order = 1 (page = 2)
2) the first memory request attempt
In _ _ alloc_pages_nodemask (), first call get_page_from_freelist () to try to apply for the first time, using the flag bit ALLOC_WMARK_LOW | ALLOC_CPUSET, which checks each zone for zone_watermark_ok (), using the incoming watermark [low] threshold.
Z-> lowmem_reserve [] is considered in zone_watermark_ok (), so that applications on normal do not fall into low-end zone. For example, for DMA32:
Free pages = 380032KB = 95008 pages
< low(1000KB = 250 pages) + lowmem_reserve[normal](94940) = 95190 所以就认为DMA32也不平不ok,同理更用不了DMA的内存。 而对于normal自己内存来说,free pages = 56552 KB = 14138 pages,也不用考虑lowmem_reserve(0),但这时还会考虑申请order(1),减去order 0的12544个page后只剩 14138 - 12544 = 1594,也小于 low / 2 = (48788KB=12197pages) / 2 = 6098 pages。 所以初次申请尝试失败,进入__alloc_pages_slowpath() 尝试进行更为积极一些的申请。 3)第二次内存申请尝试 __alloc_pages_slowpath()首先是通过 gfp_to_alloc_flags() 修改alloc_pages,设上更为强硬的标志位。这块根据原来的GFP_ATOMIC会设上 ALLOC_WMARK_MIN | ALLOC_HARDER | ALLOC_HIGH。但注意的是不会设上 ALLOC_NO_WATERMARKS 标志位。这个标志位不再判断zone的水位限制,属于优先级最高的申请,可以动用所有的reserve内存,但条件是(!in_interrupt() && ((p->Flags
& PF_MEMALLOC) | | unlikely (test_thread_flag (TIF_MEMDIE), that is, a process that cannot be in the context of an interrupt and is in the process of recycling (for example, kswapd) or exiting.
Then enter and re-enter get_page_from_pagelist () with the new alloc_pages to try the second application. Although there are ALLOC_HARDER and ALLOC_HIGH, unfortunately, it still fails in the zone_watermark_ok check of the three zone, for example, for DMA32:
Free pages = 380032KB = 95008 pages, because ALLOC_HIGH is set, the watermark [min] is halved, that is, min = min/2 = 800K / 2 = 400K = 100pages, and because of ALLOC_HARDER, min will be cut off 1 pages 4, that is, min = 3 * min/ 4 = 100pages * 3 / 4 = 75 pages. Even so, min (75 pages) + lowmem_ reserve [normal] (94940) = 95015, which is still larger than free pages, is still considered unable to allocate memory, and DMA is not successful. However, there are too few consecutive 8K pages in free pages in normal to satisfy the allocation.
After the second failure, the application fails because there is no ALLOC_NO_WATERMARK and will not enter _ _ alloc_pages_high_priority for the highest priority application, and because the allocation of GFP_ATOMIC type cannot block recycling or enter OOM.
In this case, you can appropriately increase min_free_kbytes so that kswapd starts recycling earlier, so that the system always has more free memory, and at the same time, it can appropriately reduce lowmem_reserve_ratio (optional), so that when memory is insufficient (mainly normal zone), you can borrow DMA32/DMA memory for emergency (note that it cannot and cannot be too low).
After reading the above, have you mastered the method of Linux kernel parameters min_free_kbytes and lowmem_reserve_ratio? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.