In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about how to carry out the block mapping of the ARM64 Linux kernel page table. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.
The kernel document Documentation/arm64/memory.rst describes the memory mapping of the ARM64 Linux kernel space and should be the most authoritative document in this area.
Taking a typical 4K page and 48-bit virtual address as an example, the distribution of virtual addresses in the entire kernel space is as follows:
From ffff000000000000 to ffff7fffffffffff is a linear mapping area for physical addresses, which supports the maximum physical address space of 128TB, which is very similar to the low memory mapping area of ARM32.
Let's take a look at the page table in this case. We can use the PTE mapping term corresponding to [20:12] to map the virtual address to the physical address in 4K units, or we can use the corresponding PMD mapping term [29:21] to map the virtual address to the physical address in 2m units.
For the virtual address of user space, when we do PMD mapping, we get the huge page of Huge Page,ARM64 's 2MB, which is both virtual and physically continuous. Its advantage in practical engineering is that it can reduce TLB miss, because if the mapping of 2MB is carried out, the whole 2MB no longer needs PTE, and the mapping relationship is greatly reduced.
As far as kernel space is concerned, the virtual address from ffff000000000000 to ffff7fffffffffff can obviously achieve the same effect if it is a PMD mapping to the physical address. However, this does not mean that they are Huge Page. As we all know, the fact that the kernel starts to linearly map the physical address to the virtual address does not mean that this piece of memory has been taken away by the kernel, it just makes a mapping, so that the memory applied for by calling kmalloc (), get_free_pages () and other API in the future is directly mapped between virtual and real. So, even if the kernel does PMD mapping, memory can be partitioned in units of 4K:
So, even if we do PMD mapping in kernel space, each blue circle (a 4K page) inside can still be allocated separately, which can be kmalloc, vmalloc, user-mode malloc, and so on. PMD mapping in kernel state does not mean that the relevant 2MB becomes huge page, but is purely intended to reduce TLB miss when the kernel accesses the physical address with a linearly mapped virtual address (we think the kernel uses this linearly mapped virtual address most of the time).
Of course, in more severe cases, the kernel should and can directly use the [38:30] bit PUD for mapping, so the mapping relationship is 1GB, so when the whole 1GB is followed by TLB, it only needs to take up one entry.
Of course, if the virtual-real mapping of the user state is like this, the user actually gets a giant page of 1GB. But for the linear mapping area of the kernel, even if we do the PUD mapping of 1GB, the 1G interior can be further cut into 4KB pages or 2MB giant pages. Remember: the mapping of the linear mapping region of the kernel state is only a mapping relationship, not a distribution relationship. For example, the following 1GB region of the kernel linear mapping of 1GB can still be assigned by 4K or by the user in the unit of 2MB in huge page:
We need a real debugging tool to verify our idea, this debugging method is PTDUMP (Page Table Dump), the related code in the ARM64 kernel:
Arch/arm64/mm/ptdump.c and ptdump_debugfs.c
Let's select them all so that we can get a debugfs interface:
/ sys/kernel/debug/kernel_page_tables
To learn about the kernel state page table.
I started an ARM64 virtual machine with 4GB memory with qemu, and I can see that the virtual address space of the former 1GB is mostly PMD and PTE mapping, and the back 3GB is all PUD mapping:
I added rodata=0 to my kernel startup parameters:
$cat / proc/cmdline root=/dev/vda2 rw console=ttyAMA0 ip=dhcp rodata=0
The reason is that the kernel will not do this kind of PMD and PUD mapping in several cases, and the relevant code can be found in:
Rodata_full is always set by default, and corresponds to one of the kernel's Config options, CONFIG_RODATA_FULL_DEFAULT_ENABLED, "Apply r permissions of VM areas also to their linear aliases", which improves kernel security but reduces kernel performance.
The rodata=0 I added to the kernel startup parameter actually makes rodata_full false. If I remove this kernel startup option, the kernel page table I get is completely different, and the linear mapping area is all PTE mapping:
Finally, it is worth mentioning that not only the linear mapping area can use PMD mapping, but also the vmemmap mapping area uses PMD mapping by default in the case of 4K pages:
Song Muchun children's shoes with bouncing bytes sent a patchset in an attempt to delete the memory consumption of the page struct occupied by the small page of the 4KB inside the giant page when the user got a huge page. This patchset has been sent to V11 before Christmas:
Https://lore.kernel.org/linux-mm/20201222142440.28930-1-songmuchun@bytedance.com/
In this patchset, it needs to split the PMD map of vmemmap into a PTE map:
The principle of this patchset is that when the kernel is paged in 4KB, each page requires 64 bytes of page struct. However, when the user allocates it as a giant page, we no longer need to describe each 4KB with page struct alone. In this case of compound page, we should be able to release the memory of the later page struct directly, because the situation is exactly the same, so we can leave a lot of memory.
The above is the editor for you to share how to carry out the ARM64 Linux kernel page table block mapping, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.