How to optimize the read-write Mechanism of Linux File 07/16 Update SLTechnology News&Howtos

How to optimize the read-write Mechanism of Linux File

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article is about how to optimize the Linux file read and write mechanism. Xiaobian thinks it is quite practical, so share it with everyone for reference. Let's follow Xiaobian and have a look.

Linux is a highly controllable, secure and efficient operating system. This article only discusses the reading and writing mechanism of files under Linux, and does not involve the comparison of different reading methods such as read,fread,cin, etc. These reading methods are essentially calling the system api read, but they are packaged differently. All of the following tests use the open, read, write API.

cache

Cache is a component used to reduce the average time required for high-speed devices to access low-speed devices. File reading and writing involve computer memory and disk. Memory operation speed is much faster than disk. If read and write are called each time to directly operate disk, on the one hand, speed will be limited, and on the other hand, disk service life will be reduced. Therefore, no matter whether it is a read operation or a write operation to disk, the operating system will cache data.

Page Cache

page cache (Page Cache) is a buffer located between memory and files. It is actually a memory area. All file IO(including network files) interacts directly with the page cache. The operating system maps a file to the page level through a series of data structures, such as inode, address_space, struct page. We will not discuss these specific data structures and their relationships for the time being. We only need to know the existence of the page cache and its important role in file IO. In large part, optimization of file reads and writes is optimization of page cache usage

Dirty Page

A page cache corresponds to an area in a file. If the content of the page cache is inconsistent with that of the corresponding file area, the page cache is called a dirty page. If you modify or create a page cache, dirty pages will be generated as long as you do not swipe the disk.

View page cache size

There are two ways to view page cache size on Linux, one is the free command.

$ free total used free shared buffers cached Mem: 20470840 1973416 18497424 164 270208 1202864 -/+ buffers/cache: 500344 19970496 Swap: 0 0 0

cached column is the page cache size in Byte

The other is to view/proc/meminfo directly, where we are focusing on only two fields

Cached: 1202872 kBDirty: 52 kB

Cached is the page cache size, Dirty is the dirty page size

Dirty Page Writeback Parameters

Linux has parameters that change the operating system's writeback behavior for dirty pages

$ sysctl -a 2>/dev/null | grep dirtyvm.dirty_background_ratio = 10vm.dirty_background_bytes = 0vm.dirty_ratio = 20vm.dirty_bytes = 0vm.dirty_writeback_centisecs = 500vm.dirty_expire_centisecs = 3000

vm.dirty_background_ratio is the percentage of dirty pages that can be filled in memory. When the total dirty page size reaches this ratio, the system background process will start flushing dirty pages to disk (vm.dirty_background_bytes is similar, but set by the number of bytes);vm.dirty_ratio is the absolute dirty data limit, and the percentage of dirty data in memory cannot exceed this value. If dirty data exceeds this amount, new IO requests will be blocked until dirty data is written back to disk;vm.dirty_writeback_centisecs specifies how often dirty data is written back, in hundredths of a second;vm.dirty_expire_centisecs specifies how long dirty data will live, in hundredths of a second, for example, 30 seconds. Dirty data will be written back to disk if it is in memory for more than 30 seconds during an operating system writeback operation.

These parameters can be modified by sudo sysctl -w vm.dirty_background_ratio=5, which requires root privileges, or by executing echo 5 > /proc/sys/vm/dirty_background_ratio under root

File reading and writing process

Now that we have the concepts of page caching and dirty pages, let's look at the flow of reading and writing files

read file

1. User initiates read operation

2. Operating system lookup page cache

a. If it misses, generate page missing exception, then create page cache, and read corresponding page from disk to fill page cache

b. If hit, return content to be read directly from page cache

3. User read call completed

write file

1. User-initiated write operation

2. Operating system lookup page cache

a. If it misses, generate a page missing exception, then create a page cache, and write the content passed in by the user into the page cache.

b. If it hits, write the content passed in by the user directly into the page cache.

3. User write call completed

4. Pages are modified to become dirty pages, and the operating system has two mechanisms to write dirty pages back to disk

5. The user manually calls fsync()

6. Dirty pages are written back to disk periodically by pdflush process

There is a corresponding relationship between page cache and disk file. This relationship is maintained by the operating system. The reading and writing operations to page cache are completed in kernel mode, which is transparent to users.

Optimization of File Reading and Writing

Different optimization schemes are suitable for different usage scenarios, such as file size, read and write frequency, etc. Here we do not consider the scheme of modifying system parameters. There are always gains and losses in modifying system parameters. We need to choose a balance point, which is too relevant to the service, such as whether to require strong data consistency, whether to tolerate data loss, etc. The optimization idea has the following two points:

1. Maximize page cache utilization

2. Reduce system api calls

The first point is easy to understand, try to make every IO operation hit the page cache, which is much faster than operating the disk. The system api mentioned in the second point is mainly read and write. Since the system call will enter the kernel state from the user state, and some are accompanied by copies of the memory data, reducing the system call will also improve performance in some scenarios.

readahead

readahead is a non-blocking system call that triggers the operating system to prefetch the contents of a file into the page cache and return immediately.

ssize_t readahead(int fd, off64_t offset, size_t count);

Under normal circumstances, calling readahead immediately after calling readahead does not improve reading speed. We usually call readahead in bulk reading or some time before reading. Suppose the following scenario, we need to read 1000 1M files in succession. There are two schemes as follows. The pseudocode is as follows.

Call the read function directly

char* buf = (char*)malloc(10*1024*1024);for (int i = 0; i < 1000; ++i){ int fd = open_file(); int size = stat_file_size(); read(fd, buf, size); // do something with buf close(fd);}

Call readahead first and then read

int* fds = (int*)malloc(sizeof(int)*1000);int* fd_size = (int*)malloc(sizeof(int)*1000);for (int i = 0; i < 1000; ++i){ int fd = open_file(); int size = stat_file_size(); readahead(fd, 0, size); fds[i] = fd; fd_size[i] = size;}char* buf = (char*)malloc(10*1024*1024);for (int i = 0; i < 1000; ++i){ read(fds[i], buf, fd_size[i]); // do something with buf close(fds[i]);}

Interested can write code to actually test, need to note that before testing must first write back dirty pages and empty page cache, execute the following command

sync && sudo sysctl -w vm.drop_caches=3

You can check the Cached and Dirty entries in/proc/meminfo to see if it works.

Through testing, it is found that the second method is about 10%-20% faster than the first method. In this scenario, read is executed immediately after batch execution of readahead. The optimization space is limited. If there is a scenario where readahead can be called for a period of time before read, it will greatly improve the reading speed of read itself.

This scheme actually uses the page cache of the operating system, that is, triggers the operating system to read files into the page cache in advance, and the operating system has a complete set of mechanisms for missing pages, cache hits, and cache elimination. Although users can also do cache management for their own data, it is not much different from using the page cache directly, and it will increase the maintenance cost.

mmap

mmap is a method of mapping files in memory, that is, mapping a file or other object to the address space of the process, realizing the one-to-one mapping relationship between the file disk address and a virtual address in the process virtual address space. The prototype function is as follows

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

After implementing such a mapping relationship, the process can read and write this section of memory in the form of pointers, and the system will automatically write back the dirty page to the corresponding file disk, that is, the file operation is completed without having to call read,write and other system call functions. as shown in the following figure

In addition to reducing read,write and other system calls, mmap can also reduce the number of copies of memory. For example, when read is called, a complete process is that the operating system reads disk files into the page cache, and then copies data from the page cache to the buffer passed by read. If mmap is used, the operating system only needs to read the disk into the page cache, and then the user can directly operate the memory mapped by mmap through pointers, reducing the number of copies of data from kernel state to user state.

mmap is suitable for frequent reading and writing of the same block area. For example, a 64M file stores some index information. We need to modify it frequently and persist it to disk. In this way, we can map the file to the user virtual memory through mmap, and then modify the memory area by pointer. The operating system will automatically brush the modified part back to disk. You can also call msync to manually brush the disk.

Thank you for reading! About "how to optimize Linux file reading and writing mechanism" This article is shared here, I hope the above content can be of some help to everyone, so that everyone can learn more knowledge, if you think the article is good, you can share it to let more people see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.