How to analyze the contradiction between POSTGRESQL FULL PAGE Optimization and CHECKPOINT 07/02 Update SLTechnology News&Howtos

How to analyze the contradiction between POSTGRESQL FULL PAGE Optimization and CHECKPOINT

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to analyze the contradiction between POSTGRESQL FULL PAGE optimization and CHECKPOINT, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

After saying that mysql do not turn off DW, it is indeed a bit unkind to sacrifice POSTGRESQL FULL PAGE, so it will inevitably lead to the topic that FULL PAGE also has performance problems. Is it the problem of the rooster and the horse monkey, or the story of the sheepdog of the little sheep?

First of all, FULL PAGE, like MYSQL's DW, is to solve the problem of missing data at that moment when the database is CRASH. I seem to have a look at the relevant explanation.

PostgreSQL writes the entire data page to xlog the first time it is written to the data page after checkpoint. When the host is powered off or the OS crashes, the redo operation discovers the "partially written" data page through checksum, overwrites the currently corrupted data page with the complete data page saved in xlog, and then continues redo to recover the entire database.

After reading it, an idea will suddenly come to mind.

The denser the 1 checkpoint, the more FULL PAGE.

2 the size of the first page of FULL PAGE affects write performance

Then I'll just turn off FULL PAGE.

Of course not, FULL PAGE is not turned on or off at some point, especially when you are operating pg_basebackup (or a series of camouflage people who use this command). It has something to do with the principle of backup.

In fact, this idea is quite ingenious, it only detects the first page of CHECK POINT and writes it to XLOG.

The objection or negative opinion is that full_page_write needs to record data pages in xlog (wal), and will do more write operations, not only data changes, but also data page information, which will increase IO and disk consumption, resulting in larger master / slave latency.

In addition, for the specific explanation and explanation of FULL PAGE, see the figure below, if necessary (you can join the group and download the relevant books for free)

At the same time, many optimization methods come. There is a way to optimize the FULL PAGE mode of PG much like MYSQL's DW, but you need to type PATCH on PG. (it will not be expanded in detail here)

In addition, reducing the frequency of CHECKPOINT can also reduce the performance impact of FULL PAGE. It is also proposed to make the data page of PG smaller when compiling (I think it is a bit too much). Of course, this is not the ultimate optimization method. Some articles have proposed the different performance impact of using sequence and UUID on FULL PAGE. It is pointed out that if you use sequence, compare UUID when inserting the same number of rows of data. The size of the resulting WAL varies by 20 times.

In the key paragraph, it is proposed that if the sequence is used as the primary key, the same leaf page inserted into the btree index will only be written to the entire page after the first modification to the page. UUID is a completely different situation, and the UUID value is not continuous at all. In fact, each insertion may touch a brand-new leaf index leaf page, resulting in a large amount of log writing that will affect the performance of IWeiO.

It is also proposed above that the spacing of CHECK POINT can be adjusted to reduce the impact of FULL PAGE, and how to adjust the spacing of CHECK POINT.

The above is what we can see in every PG related to CHECKPOINT, in a heavy load system, whether we can turn up the MAX_WAL_SIZE, (some people may ask how this number came from, then I think you may be able to understand the installation of PG), adjust to how big, 8-10G is no problem. Increasing the distance between checkpoints can reduce WAL-but why is this happening? Remember: the purpose of putting the transaction log first is to ensure that the system can still function properly after a crash. Applying these changes in WAL to the data file will repair the data file and restore the system at startup. For security reasons, PostgreSQL cannot simply record changes made to a block-if a block is changed for the first time after passing a checkpoint, the entire page must be sent to WAL.

The distance between the two checkpoints not only increases speed due to the reduction of checkpoints, but also affects the number of transaction logs written. So when optimizing FULL PAGE, we should consider reducing the checkpoint. in fact, this means the same thing as adjusting innodb log file size in MYSQL.

These two parameters are also explained in Postgresql 11 Administratoration Cookbook, page 427 of this book.

Two parameters control the frequency and expectation of CHECKPOINT occurrence. First, the parameter checkpoint_timout will trigger the HECKPOINT after the last CHECKPOINT occurs, which directly affects how much WAL data will be written to the target file. In fact, there are two parameters that control the number of WAL writes.

MAX_WAL_SIZE

CHECKPOINT_TIMEOUT

However, when you adjust the parameters, you should consider the recovery time of the system after CRASH, make a measure between how long it takes to do CHECKPOINT, and also consider whether there is enough space for the WAL log on disk.

At the same time, in addition to two parameters can optimize the performance of FULL PAGE at the same time, whether there are other methods, we know that WAL logs are also buffered, the default is synchronous submission, as long as there is COMMIT log information will be brushed into the disk, wal_buffers can adjust the size of wal_buffer, and adjust synchronous_commit is not synchronous submission, reduce disk interaction, but this will break the system CRASH database security, there is the risk of database loss.

Then how to make more targeted use of wal_buffers, according to the business, in some cases, by adjusting the following parameters at the level of session, to syschronous_commit = off in some cases that can tolerate database loss within 1 second.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.