What is the function used to brush a dirty page in PostgreSQL checkpoint 07/08 Update SLTechnology News&Howtos

What is the function used to brush a dirty page in PostgreSQL checkpoint

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the function used to brush a dirty page in PostgreSQL checkpoint". The content in the article is simple and clear, and it is easy to learn and understand. Please follow Xiaobian's train of thought to study and learn "what is the function used to brush a dirty page in PostgreSQL checkpoint".

I. data structure

Macro definition

Checkpoints request flag bits, checkpoint request tag bit definition.

/ * OR-able request flag bits for checkpoints. The "cause" bits are used only * for logging purposes. Note: the flags must be defined so that it's * sensible to OR together request flags arising from different requestors. * / / * These directly affect the behavior of CreateCheckPoint and subsidiaries * / # define CHECKPOINT_IS_SHUTDOWN 0x0001 / * Checkpoint is for shutdown * / # define CHECKPOINT_END_OF_RECOVERY 0x0002 / * Like shutdown checkpoint, but * issued at end of WAL recovery * / # define CHECKPOINT_IMMEDIATE 0x0004 / * Do it without delays * / # define CHECKPOINT_FORCE 0x0008 / * Force even if no activity * / # define CHECKPOINT_FLUSH_ALL 0x0010 / * Flush all pages Including those * belonging to unlogged tables * / / * These are important to RequestCheckpoint * / # define CHECKPOINT_WAIT 0x0020 / * Wait for completion * / # define CHECKPOINT_REQUESTED 0x0040 / * Checkpoint request has been made * / / * These indicate the cause of a checkpoint request * / # define CHECKPOINT_CAUSE_XLOG 0x0080 / * XLOG consumption * / # define CHECKPOINT_CAUSE_TIME 0x0100 / * Elapsed time * / II. Source code interpretation

SyncOneBuffer, which processes a buffer during syncing, its main processing logic is as follows:

1. Get buffer descriptor

two。 Lock buffer

3. Perform relevant judgment / processing according to buffer status and input parameters

4. Nail the dirty page, put on the sharing lock, and call FlushBuffer to refresh the disk.

5. Unlock / unlock nails and other finishing work

/ * SyncOneBuffer-process a single buffer during syncing. * process a buffer * * If skip_recently_used is true, we don't write currently-pinned buffers, nor * buffers marked recently used, as these are not replacement candidates. * if skip_recently_used is T, neither currently-pinned buffers nor the recently used buffers is written, because these buffers are not replaceable buffers. * * Returns a bitmask containing the following flag bits: * BUF_WRITTEN: we wrote the buffer. * BUF_REUSABLE: buffer is available for replacement, ie, it has * pin count 0 and usage count 0. * return bitmask: * BUF_WRITTEN: buffer * BUF_REUSABLE: buffer can be used to replace (pin count and usage count are both 0) * * (BUF_WRITTEN could be set in error if FlushBuffers finds the buffer clean * after locking it, but we don't care all that much.) * * Note: caller must have done ResourceOwnerEnlargeBuffers. * / static intSyncOneBuffer (int buf_id, bool skip_recently_used, WritebackContext * wb_context) {BufferDesc * bufHdr = GetBufferDescriptor (buf_id); int result = 0; uint32 buf_state; BufferTag tag; ReservePrivateRefCountEntry (); / * Check whether buffer needs writing. * check whether buffer needs to be written. * * We can make this check without taking the buffer content lock so long * as we mark pages dirty in access methods * before* logging changes with * XLogInsert (): if someone marks the buffer dirty just after our check we * don't worry because our checkpoint.redo points before log record for * upcoming changes and so we are not required to write such dirty buffer. * when using the access method to mark the pages as dirty before using the XLogInsert () logging change, * it is not necessary to hold the lock for too long to perform the check: * because if a process marks buffer as dirty after the check, * in this case the checkpoint.redo points to the location of the log before the change, so there is no need to worry and write such dirty blocks. * / buf_state = LockBufHdr (bufHdr); if (BUF_STATE_GET_REFCOUNT (buf_state) = = 0 & & BUF_STATE_GET_USAGECOUNT (buf_state) = = 0) {result | = BUF_REUSABLE;} else if (skip_recently_used) {/ * Caller told us not to write recently-used buffers * / / skip the recently used buffer UnlockBufHdr (bufHdr, buf_state); return result } if (! (buf_state & BM_VALID) | |! (buf_state & BM_DIRTY)) {/ * It's clean, so nothing to do * / / buffer is invalid or not dirty UnlockBufHdr (bufHdr, buf_state); return result;} / * * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the * buffer is clean by the time we've locked it.) * nail it, put on the sharing lock, and brush it on the disk. * / PinBuffer_Locked (bufHdr); LWLockAcquire (BufferDescriptorGetContentLock (bufHdr), LW_SHARED); / / call FlushBuffer / / If the caller has an smgr reference for the buffer's relation, pass it as the second parameter. / / If not, pass NULL. FlushBuffer (bufHdr, NULL); LWLockRelease (BufferDescriptorGetContentLock (bufHdr)); tag = bufHdr- > tag; UnpinBuffer (bufHdr, true); ScheduleBufferTagForWriteback (wb_context, & tag); return result | BUF_WRITTEN;}

FlushBuffer

The FlushBuffer function physically flushes the shared cache, and the main function is smgrwrite (storage manager write).

/ * * FlushBuffer * Physically write out a shared buffer. * physically flush the shared cache. * * NOTE: this actually just passes the buffer contents to the kernel; the * real write to disk won't happen until the kernel feels like it. This * is okay from our point of view since we can redo the changes from WAL. * However, we will need to force the changes to disk via fsync before * we can checkpoint WAL. * just send the buffer content to the os kernel, and it is up to os to determine when to actually write the disk. * before checkpoint WAL, you need to force the disk to be removed through fsync. * The caller must hold a pin on the buffer and have share-locked the * buffer contents. (Note: a share-lock does not prevent updates of * hint bits in the buffer, so the page could change while the write * is in progress, but we assume that that will not invalidate the data * written.) * the caller must pin the cache and hold the shared lock. * (Note: shared locks do not update hint bits in buffer, so page may change during writing, * but I assume that will not invalidate written data) * * If the caller has an smgr reference for the buffer's relation, pass it * as the second parameter. If not, pass NULL. * / static voidFlushBuffer (BufferDesc * buf, SMgrRelation reln) {XLogRecPtr recptr; ErrorContextCallback errcallback; instr_time io_start, io_time; Block bufBlock; char * bufToWrite; uint32 buf_state; / * Acquire the buffer's io_in_progress lock. If StartBufferIO returns * false, then someone else flushed the buffer before we could, so we need * not do anything. * / if (! StartBufferIO (buf, false) return; / * Setup error traceback support for ereport () * / errcallback.callback = shared_buffer_write_error_callback; errcallback.arg = (void *) buf; errcallback.previous = error_context_stack; error_context_stack = & errcallback; / * Find smgr relation for buffer * / if (reln = = NULL) reln = smgropen (buf- > tag.rnode, InvalidBackendId) TRACE_POSTGRESQL_BUFFER_FLUSH_START (buf- > tag.forkNum, buf- > tag.blockNum, reln- > smgr_rnode.node.spcNode, reln- > smgr_rnode.node.dbNode, reln- > smgr_rnode.node.relNode); buf_state = LockBufHdr (buf) / * Run PageGetLSN while holding header lock, since we don't have the * buffer locked exclusively in all cases. * / recptr = BufferGetLSN (buf); / * To check if block content changes while flushing. -vadim 01 Force XLOG flush up to buffer's LSN 17 Force XLOG flush up to buffer's LSN 97 * / buf_state & = ~ BM_JUST_DIRTIED; UnlockBufHdr (buf, buf_state). This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway. Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless. However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel. It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences. To make sure that can't happen, * skip the flush if the buffer isn't permanent. * / if (buf_state & BM_PERMANENT) XLogFlush (recptr); / * * Now it's safe to write buffer to disk. Note that no one else should * have been able to write it while we were busy with log flushing because * we have the io_in_progress lock. * / bufBlock = BufHdrGetBlock (buf); / * * Update page checksum if desired. Since we have only shared lock on the * buffer, other processes might be updating hint bits in it, so we must * copy the page to private storage if we do checksumming. * / bufToWrite = PageSetChecksumCopy ((Page) bufBlock, buf- > tag.blockNum); if (track_io_timing) INSTR_TIME_SET_CURRENT (io_start); / * bufToWrite is either the shared buffer or a copy, as appropriate. * / smgrwrite (reln, buf- > tag.forkNum, buf- > tag.blockNum, bufToWrite, false); if (track_io_timing) {INSTR_TIME_SET_CURRENT (io_time); INSTR_TIME_SUBTRACT (io_time, io_start); pgstat_count_buffer_write_time (INSTR_TIME_GET_MICROSEC (io_time)); INSTR_TIME_ADD (pgBufferUsage.blk_write_time, io_time) } pgBufferUsage.shared_blks_written++; / * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and * end the io_in_progress state. * / TerminateBufferIO (buf, true, 0); TRACE_POSTGRESQL_BUFFER_FLUSH_DONE (buf- > tag.forkNum, buf- > tag.blockNum, reln- > smgr_rnode.node.spcNode, reln- > smgr_rnode.node.dbNode, reln- > smgr_rnode.node.relNode); / * Pop the error context stack * / error_context_stack = errcallback.previous Third, follow-up and analysis

Test script

Testdb=# update t_wal_ckpt set c2 = 'C4pm' | | substr (c2mem4 and UPDATE 1testdb=# checkpoint)

Tracking and analysis

(gdb) handle SIGINT print nostop passSIGINT is used by the debugger.Are you sure you want to change it? (y or n) ySignal Stop Print Pass to program DescriptionSIGINT No Yes Yes Interrupt (gdb) b SyncOneBufferBreakpoint 1 at 0x8a7167: file bufmgr.c, line 2357. (gdb) cContinuing.Program received signal SIGINT, Interrupt.Breakpoint 1, SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7fff27f5ae00) at bufmgr.c:23572357 BufferDesc * bufHdr = GetBufferDescriptor (buf_id); (gdb) n2358 int result = 0 (gdb) p * bufHdr$1 = {tag = {rnode = {spcNode = 1663, dbNode = 16384, relNode = 221290}, forkNum = MAIN_FORKNUM, blockNum = 0}, buf_id = 0, state = {value = 3548905472}, wait_backend_pid = 0, freeNext =-2, content_lock = {tranche = 53, state = {value = 536870912}, waiters = {head = 2147483647, tail = 2147483647} (gdb) n2362 ReservePrivateRefCountEntry (); (gdb) 2373 buf_state = LockBufHdr (bufHdr) (gdb) 2375 if (BUF_STATE_GET_REFCOUNT (buf_state) = = 0 & & (gdb) 2376 BUF_STATE_GET_USAGECOUNT (buf_state) = = 0) (gdb) 2375 if (BUF_STATE_GET_REFCOUNT (buf_state) = = 0 & & (gdb) 2380 else if (skip_recently_used) (gdb) 2387 if (! (buf_state & BM_VALID) | |! (buf_state & BM_DIRTY)) (gdb) 2398 PinBuffer_Locked (bufHdr) (gdb) p buf_state$2 = 3553099776 (gdb) n2399 LWLockAcquire (BufferDescriptorGetContentLock (bufHdr), LW_SHARED); (gdb) 2401 FlushBuffer (bufHdr, NULL); (gdb) stepFlushBuffer (buf=0x7fedc4a68300, reln=0x0) at bufmgr.c:26872687 if (! StartBufferIO (buf, false)) (gdb) n2691 errcallback.callback = shared_buffer_write_error_callback; (gdb) 2692 errcallback.arg = (void *) buf; (gdb) 2693 errcallback.previous = error_context_stack; (gdb) 2694 error_context_stack = & errcallback (gdb) 2697 if (reln = = NULL) (gdb) 2698 reln = smgropen (buf- > tag.rnode, InvalidBackendId); (gdb) 2700 TRACE_POSTGRESQL_BUFFER_FLUSH_START (buf- > tag.forkNum, (gdb) 2706 buf_state = LockBufHdr (buf); (gdb) 2712 recptr = BufferGetLSN (buf); (gdb) 2715 buf_state & = BM_JUST_DIRTIED; (gdb) p recptr$3 = 16953421760 (gdb) n2716 UnlockBufHdr (buf, buf_state) (gdb) 2735 if (buf_state & BM_PERMANENT) (gdb) 2736 XLogFlush (recptr); (gdb) 2743 bufBlock = BufHdrGetBlock (buf); (gdb) 2750 bufToWrite = PageSetChecksumCopy ((Page) bufBlock, buf- > tag.blockNum); (gdb) p bufBlock$4 = (Block) 0x7fedc4e68300 (gdb) n2752 if (track_io_timing) (gdb) 2758 smgrwrite (reln, (gdb) 2764 if (track_io_timing) (gdb) 2772 pgBufferUsage.shared_blks_written++; (gdb) 2778 TerminateBufferIO (buf, buf, 0) (gdb) 2780 TRACE_POSTGRESQL_BUFFER_FLUSH_DONE (buf- > tag.forkNum, (gdb) 2787 error_context_stack = errcallback.previous; (gdb) 2788} (gdb) SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7fff27f5ae00) at bufmgr.c:24032403 LWLockRelease (BufferDescriptorGetContentLock (bufHdr)); (gdb) 2405 tag = bufHdr- > tag; (gdb) 2407 UnpinBuffer (bufHdr, true); (gdb) 2409 ScheduleBufferTagForWriteback (wb_context, & tag); (gdb) 2411 return result | BUF_WRITTEN (gdb) 2412} (gdb) Thank you for your reading, this is the content of "what is the function used to brush a dirty page in PostgreSQL checkpoint". After the study of this article, I believe you have a deeper understanding of what the function used to brush a dirty page in PostgreSQL checkpoint is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.