In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "the basic flow of JAVA document writing and the introduction of the principle of page cache automatic write-back mechanism". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "the basic process of writing files in JAVA and the introduction of the principle of page cache automatic write-back mechanism".
The basic process of writing documents in JAVA
Without using out-of-heap memory, java first writes bytes into JVM's in-heap memory when writing files; then calls jvm's write file function to write bytes into jvm's out-of-heap memory, jvm then calls the system kernel's write file function to write bytes into kernel heap; then the kernel writes bytes into page cache, changes the page cache state to dirty, and writes bytes to disk at an appropriate time according to page cache's write-back mechanism.
Page cache automatic writeback mechanism
The timing of page cache write back is determined by several parameters in the system configuration / etc/sysctl.conf, which are:
Dirty_background_bytes
Default value: 0 means it is not enabled
When dirty pages take up more memory than dirty_background_bytes, the kernel pdflush thread will write back dirty pages from the background, which will not affect the application's post-order IO operations.
Dirty_background_ratio
Default value: 10
Parameter meaning: when the percentage of dirty pages (relative to all available memory, that is, free memory pages + recyclable memory pages) reaches dirty_background_ratio, the kernel pdflush thread starts to write back dirty pages from the background, which will not affect the application's post-order IO operations. Increasing will use more memory for buffering, which can improve the read and write performance of the system. This value should be lowered when continuous, constant writing is required.
Note: the dirty_background_bytes parameter and the dirty_background_ratio parameter are relative and only one can be specified. When one of the parameter files is written, the dirty page limit is calculated immediately, and the value of the other parameter is cleared to zero.
Dirty_bytes
Default value: 0 means it is not enabled
When the amount of memory occupied by a dirty page reaches dirty_bytes, the kernel brushes the data from the dirty page to disk and blocks subsequent IO operations.
Note: the dirty_bytes parameter and the dirty_ratio parameter are relative and only one can be specified. When one of the parameter files is written, the dirty page limit is calculated immediately, and the value of the other parameter is cleared to zero
Dirty_ratio
Default value: 20
Parameter meaning: when the percentage of dirty pages (relative to all available memory, that is, free memory pages + recyclable memory pages) reaches dirty_ratio, the kernel brushes the data from dirty pages to disk and blocks subsequent IO operations.
Comparison between dirty_background_ratio and dirty_ratio
Dirty_ratio is a mandatory writeback, that is, when the proportion of dirty pages in a memory area reaches this ratio, it will trigger the process of kernel memory management to force dirty pages to write back, but dirty_background_ratio is a soft behavior, because this is a process carried out through pdflush kernel threads, which can be executed in the background to write back these dirty pages, which will not affect the current implementation process. Therefore, when the default proportion of dirty pages in the Linux kernel reaches 5%, it will first be written back through the pdflush kernel thread. When the proportion of dirty pages reaches 10%, it is a very serious situation. At this time, a forced writeback will be triggered in the process of balancing dirty pages, so that the system can return to the original preset and reasonable state.
Dirty_expire_centisecs
Default value: 3000
Parameter meaning: used to specify how long dirty data can survive. The specified value is calculated as 100 per second. Only when this value is exceeded will it be written to disk by the kernel process pdflush.
Dirty_writeback_centisecs
Default value: 500
Parameter meaning: the frequency of pdflush write back, the interval of each wake up, is counted as 1 second by the number 100. Setting this value to 500 is equivalent to waking up the pdflush process in 5 seconds. If you set this value to 0, periodic data writeback is completely prohibited.
Drop_caches
Writing numeric values to the / proc/sys/vm/drop_caches file causes the kernel to free up memory occupied by page cache,dentries and inodes caches.
Release only pagecache:
Echo 1 > / proc/sys/vm/drop_caches
Release only the dentries and inodes caches:
Echo 2 > / proc/sys/vm/drop_caches
Free the pagecache, dentries, and inodes caches:
Echo 3 > / proc/sys/vm/drop_caches
This operation is not destructive, and dirty objects (such as dirty pages) will not be released, so run the sync command first.
Note: this can only be released manually.
test
Use javaBIO to continuously write data to a file, and observe the size of the page cache through pcstat. Initial state of the file:
[root@node01] # ll test.txt & & pcstat test.txt-rw-r--r--. 1 root root 0 Aug 1 19:58 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 0 | 0 | 0 | NaN | +- -+ dirty_background_bytes test
The dirty page threshold is set to 10MB, and the dirty page survival time is set to 5000s. When the dirty page size is less than 10MB, it will never be brushed into the disk. When the dirty page size exceeds 10MB, but the survival time is no more than 5000s, the dirty_background_bytes condition is met and the dirty page is brushed to disk.
# modify the system configuration [root@node01 ~] # vi / etc/sysctl.conf...# for testing, set to 10485760 10mb vm.dirty_background_bytes = 1048576cycles for testing, 104857600 100mb vm.dirty_bytes = 10485760 cycles for testing, 500000 5000svm.dirty_writeback_centisecs = 50000cycles for testing, 30000 5minvm.dirty_expire_centisecs = 30000 # load system configuration [root@node01 ~] # sysctl-p
When the page cache size exceeds the 10mb, stop writing, power off, and restart the device.
Page cache status and file size before shutdown.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 1.7K Aug 1 20:20 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent |-+- -- +-| | test.txt | 1660 | 1 | 1 | 100.000 | +-+
Turn off the power and restart. (note that the power supply cannot be rebooted according to the normal process)
Page cache status and file size after restart.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 0 Aug 1 20:20 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 0 | 0 | 0 | NaN | +-+
All the previously written data is lost, and the data in the page cache is lost because the threshold for automatically flushing into the disk in the background (dirty_background_bytes) has not been reached.
When the page cache size exceeds the 10mb, stop writing, power off, and restart the device.
Page cache status and file size before shutdown.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 37m Aug 1 20:26 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 37985420 | 9274 | 9274 | 100.000 | +-+
Turn off the power and restart. (note that the power supply cannot be rebooted according to the normal process)
Page cache status and file size after restart.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 30m Aug 1 20:26 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 31035392 | 7577 | 0 | 000.000 | +-+
Part of the previously written data is lost, and the kernel performs a flush operation every time it reaches the background automatic flush disk threshold (dirty_background_bytes), while the part that does not reach the threshold is lost. 37m before shutdown and 30m after shutdown, the automatic brush-in threshold is set to 10m, which is 37-10cm. The 7m data is not flushed to disk and is lost after restart.
Dirty_expire_centisecs test
The threshold of dirty pages is set to 100MB, the detection frequency is set to 1s, and the survival time of dirty pages is set to 15s. This can achieve such a test purpose, when my page cache size does not reach the 100MB, because the survival time is more than 15s, and the detection is very timely, almost all of it can be flushed into the disk. Why almost? Because in any case, depending on the detection frequency automatic brushing mechanism will eventually lose the data to detect the vacuum period. Dirty pages are clean before being rewritten every time they are brushed to disk.
# modify system configuration [root@node01 ~] # vi / etc/sysctl.conf...# for testing, set to 104857600 100mb vm.dirty_background_bytes = 10485760 for testing, set to 104857600 100mb vm.dirty_bytes = 10485760 for testing, set to 1500 15svm.dirty_writeback_centisecs = 150 for testing, set to 100 1svm.dirty_expire_centisecs = 100 # load system configuration [root@node01 ~] # sysctl-p
When the page cache size exceeds the 100mb, stop writing, no more than 15s, power off, and restart the device.
Page cache status and file size before shutdown.
-rw-r--r--. 1 root root 150 Aug 1 20:44 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 150 | 1 | 1 | 100.000 | +-+
Turn off the power and restart. (note that the power supply cannot be rebooted according to the normal process)
Page cache status and file size after restart.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 0 Aug 1 20:44 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 0 | 0 | 0 | NaN | +-+
All previously written data is lost, and data within the page cache is lost because it does not exceed the survival time of dirty pages.
When the page cache has survived for more than 15 seconds, turn off the power and restart the device.
Page cache status and file size before shutdown.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 1.7K Aug 1 20:48 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent |-+- -- +-| | test.txt | 1680 | 1 | 1 | 100.000 | +-+
Turn off the power and restart. (note that the power supply cannot be rebooted according to the normal process)
Page cache status and file size after restart.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 1.7K Aug 1 20:48 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent |-+- -- +-| | test.txt | 1680 | 1 | 1 | 100.000 | +-+
All detected data are retained for more than 15 seconds.
Dirty_writeback_centisecs test
The threshold of dirty pages is set to 100MB, the detection frequency is set to 15s, and the survival time of dirty pages is set to 1s. This can achieve the purpose of such a test, when my page cache size does not reach the 100MB and exceeds the survival time of 1s, due to the untimely detection, the data exceeding the survival time is still not flushed to disk.
# modify the system configuration [root@node01 ~] # vi / etc/sysctl.conf...# for testing, set to 104857600 100mb vm.dirty_background_bytes = 10485760 for testing, set to 104857600 100mb vm.dirty_bytes = 10485760 for testing, set to 100 1svm.dirty_writeback_centisecs = 10 for testing, set to 1500 15svm.dirty_expire_centisecs = 1500 # load system configuration [root@node01 ~] # sysctl-p
When the page cache size exceeds the 100mb, stop writing, no more than 15s, power off, and restart the device.
Page cache status and file size before shutdown.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 550Aug 1 21:02 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 550 | 1 | 1 | 100.000 | +-+
Turn off the power and restart. (note that the power supply cannot be rebooted according to the normal process)
Page cache status and file size after restart.
[root@node01] # ll-h test.txt & & pcstat test.txt-rw-r--r--. 1 root root 0 Aug 1 21:04 test.txt+-+ | Name | Size (bytes) | Pages | Cached | Percent | |-+- -+-| | test.txt | 0 | 0 | 0 | NaN | +-+
All the previously written data is lost, although it exceeds the survival time of the dirty page, but the detection time is not reached, and the data in the page cache is lost.
At this point, I believe you have a deeper understanding of the basic process of writing documents in JAVA and the principle of page cache automatic write-back mechanism, so you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.