Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to achieve complete parsing of HugePages

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to fully analyze HugePages. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

This article uses version 2.6.23 of the Linux kernel

HugePages allocator initialization

During kernel initialization, the hugetlb_init function is called to initialize the HugePages allocator, which is implemented as follows:

Static int _ _ init hugetlb_init (void) {unsigned long i; / / 1. Initialize the idle large memory page linked list hugepage_freelists, / / the kernel uses the hugepage_freelists linked list to connect idle large memory pages, / / for simplicity of analysis, we can think of MAX_NUMNODES as 1 for (I = 0; I

< MAX_NUMNODES; ++i) INIT_LIST_HEAD(&hugepage_freelists[i]); // 2. max_huge_pages 为系统能够使用的大页内存的数量, // 由系统启动项 hugepages 指定, // 这里主要申请大内存页, 并且保存到 hugepage_freelists 链表中. for (i = 0; i < max_huge_pages; ++i) { if (!alloc_fresh_huge_page()) break; } max_huge_pages = free_huge_pages = nr_huge_pages = i; return 0; } hugetlb_init 函数主要完成两个工作: 初始化空闲大内存页链表 hugepage_freelists,这个链表保存了系统中能够使用的大内存。 为系统申请空闲的大内存页,并且保存到 hugepage_freelists 链表中。 我们再来分析下 alloc_fresh_huge_page 函数是怎么申请大内存页的,其实现如下: static int alloc_fresh_huge_page(void) { static int prev_nid; struct page *page; int nid; ... // 1. 申请一个大的物理内存页... page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, HUGETLB_PAGE_ORDER); if (page) { // 2. 设置释放大内存页的回调函数为 free_huge_page set_compound_page_dtor(page, free_huge_page); ... // 3. put_page 函数将会调用上面设置的 free_huge_page 函数把内存页放入到缓存队列中 put_page(page); return 1; } return 0; } 所以,alloc_fresh_huge_page 函数主要完成三个工作: 调用 alloc_pages_node 函数申请一个大内存页(2MB)。 设置大内存页的释放回调函数为 free_huge_page,当释放大内存页时,将会调用这个函数进行释放操作。 调用 put_page 函数释放大内存页,其将会调用 free_huge_page 函数进行相关操作。 那么,我们来看看 free_huge_page 函数是怎么释放大内存页的,其实现如下: static void free_huge_page(struct page *page) { ... enqueue_huge_page(page); // 把大内存页放置到空闲大内存页链表中 ... } free_huge_page 函数主要调用 enqueue_huge_page 函数把大内存页添加到空闲大内存页链表中,其实现如下: static void enqueue_huge_page(struct page *page) { int nid = page_to_nid(page); // 我们假设这里一定返回 0 // 把大内存页添加到空闲链表 hugepage_freelists 中 list_add(&page->

Lru, & hugepage_ freelists [nid]); / / increase the counter free_huge_pages++; free_huge_pages_ Node [NIDD] +;}

As can be seen from the above implementation, the enqueue_huge_page function simply adds a large memory page to the free linked list hugepage_freelists and increases the counter.

If we set the number of large memory pages that the system can use to 100, the structure of the free large memory page linked list hugepage_freelists is shown in the following figure:

Therefore, the call chain initialized by the HugePages allocator is:

Hugetlb_init () | +-> alloc_fresh_huge_page () | |-- > alloc_pages_node () |-- > set_compound_page_dtor () +-> put_page () | +-- > free_huge_page () | +-> enqueue_huge_page () hugetlbfs file system

After preparing free large memory pages for the system, let's learn how to allocate large memory pages. As described in the article "understanding the principles of HugePages", to apply for large memory pages, you must use mmap system calls to map virtual memory to files in the hugetlbfs file system.

Instead of the tedious file system mount process, let's take a look at what happens when you use mmap system calls to map virtual memory to files in the hugetlbfs file system.

Each file descriptor object has a mmap method, which is triggered when the mmap function is called to map to the file. Let's take a look at the real function corresponding to the mmap method of the hugetlbfs file, as follows:

Const struct file_operations hugetlbfs_file_operations = {.mmap = hugetlbfs_file_mmap, .fsync = simple_sync_file, .get _ unmapped_area = hugetlb_get_unmapped_area,}

As you can see from the above code, the mmap method of the hugetlbfs file is set to the hugetlbfs_file_mmap function. So when the mmap function is called to map the hugetlbfs file, the hugetlbfs_file_mmap function will be called to handle it.

The main task of the hugetlbfs_file_mmap function is to add the VM_HUGETLB flag bit to the vm_flags field of the virtual memory partition object, as shown in the following code:

Static int hugetlbfs_file_mmap (struct file * file, struct vm_area_struct * vma) {. Vma- > vm_flags | = VM_HUGETLB | VM_RESERVED; / / add VM_HUGETLB flag bits to the virtual memory partition. Return ret;}

The purpose of setting the VM_HUGETLB flag bit for a virtual memory partition object is that special processing is performed when the virtual memory partition is physically mapped, as described below.

Virtual memory and physical memory mapping

After mapping to a hugetlbfs file using the mmap function, a virtual memory address is returned. When accessing (reading and writing) the virtual memory address, the page fault exception will be triggered because the virtual memory address has not been mapped to the physical memory address, and the kernel will call the do_page_fault function to fix the page fault exception.

Let's take a look at the whole process, as shown in the following figure:

So, eventually, the do_page_fault function will be called to fix the page fault exception. Let's take a look at what do_page_fault has done, as follows:

Asmlinkage void _ _ kprobes do_page_fault (struct pt_regs * regs, unsigned long error_code) {. Struct mm_struct * mm; struct vm_area_struct * vma; unsigned long address;... Mm = tsk- > mm; / / 1. Get the memory management object address = read_cr2 () for the current process; / / 2. Get the virtual memory address that triggered the page fault exception. Vma = find_vma (mm, address) / / 3. Get the corresponding virtual memory partition object through the virtual memory address. / / 4. Call the handle_mm_fault function to fix the exception fault = handle_mm_fault (mm, vma, address, write); Return;}

The above code simplifies do_page_fault, and after simplification, it mainly completes four tasks:

Gets the memory management object corresponding to the current process.

Call read_cr2 to get the virtual memory address that triggered the page fault exception.

The corresponding virtual memory partition object is obtained by triggering the virtual memory address of the page fault exception.

Call the handle_mm_fault function to fix the page fault exception.

Let's move on to the implementation of the handle_mm_fault function, which is as follows:

Int handle_mm_fault (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access) {. If (unlikely (is_vm_hugetlb_page (vma) / / whether the virtual memory partition needs to use HugePages return hugetlb_fault (mm, vma, address, write_access); / / if HugePages is used, call hugetlb_fault for processing.}

After simplifying the handle_mm_fault function, the logic is very clear. If the virtual memory partition uses HugePages, then the hugetlb_fault function is called to handle it (we happen to enter this branch because we analyze the use of HugePages).

The hugetlb_fault function mainly populates the page table of the process, so let's review the page table structure corresponding to HugePages, as shown in the following figure:

As you can see from the figure above, after using HugePages, the middle directory points directly to the physical memory page. Therefore, the hugetlb_fault function is mainly to populate the directory items in the middle of the page. The implementation is as follows:

Int hugetlb_fault (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access) {pte_t * ptep; pte_t entry; int ret; ptep = huge_pte_alloc (mm, address); / 1. Find the middle page entry corresponding to the virtual memory address. Entry = * ptep; if (pte_none (entry)) {/ / if the entry in the middle of the page has not been mapped / / 2. Then call the hugetlb_no_page function to map ret = hugetlb_no_page (mm, vma, address, ptep, write_access);... Return ret;}...}

After simplifying the hugetlb_fault function, two main tasks are completed:

Find the corresponding page middle directory entry by triggering the virtual memory address of the page fault exception.

Call the hugetlb_no_page function to map the middle page directory items.

Let's take a look at how the hugetlb_no_page function populates the middle-page directory entries:

Static int hugetlb_no_page (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, pte_t * ptep, int write_access) {. Page = find_lock_page (mapping, idx); if (! page) {. / / 1. Apply for a large memory page page = alloc_huge_page (vma, address) from the free large memory page linked list hugepage_freelists;.}. / / 2. Generate the value of the directory entry in the middle of the page new_pte = make_huge_pte (vma, page, ((vma- > vm_flags & VM_WRITE) & & (vma- > vm_flags & VM_SHARED); / / 3. Set the value of the entry in the middle of the page to the value set_huge_pte_at (mm, address, ptep, new_pte) generated above;... Return ret;}

After simplifying the hugetlb_no_page function, three main tasks are completed:

Call the alloc_huge_page function to request a large memory page from the free large memory page linked list hugepage_freelists.

Generates the value of the directory entry in the middle of the page from the physical address of the large memory page.

Set the value of the directory entry in the middle of the page to the value generated above.

At this point, the mapping process for HugePages is complete.

Another question is, how does CPU know whether the middle table item points to a page table or a large memory page?

This is because the middle table item of the page has a flag bit of PSE, which, if set to 1, indicates that it points to the large memory page, otherwise it points to the page table.

The above is the editor for you to share how to achieve a complete analysis of HugePages, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report