In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This section provides a brief introduction to the storage-related function RecordAndGetPageWithFreeSpace- > fsm_set_and_search during PostgreSQL insertion, which sets the appropriate value for a given FSM page and returns the appropriate slot.
I. data structure
FSMAddress
The internal FSM process works as a logical address scheme, and each level of the tree can be thought of as an independent address file.
/ * The internal FSM routines work on a logical addressing scheme. Each * level of the tree can be thought of as a separately addressable file. * the internal FSM process works on a logical address scheme. * each level of the tree can be thought of as a separate address file. * / typedef struct {/ / hierarchical int level; / * level * / / the page number within this hierarchy is int logpageno; / * page number within the level * /} FSMAddress;/* Address of the root page. * / / Root page address static const FSMAddress FSM_ROOT_ADDRESS = {FSM_ROOT_LEVEL, 0}
FSMPage
FSM page data structure. For details, please refer to src/backend/storage/freespace/README.
/ * Structure of a FSM page. See src/backend/storage/freespace/README for * details. * FSM page data structure. For details, please refer to src/backend/storage/freespace/README. * / typedef struct {/ * fsm_search_avail () tries to spread the load of multiple backends by * returning different pages to different backends in a round-robin * fashion. Fp_next_slot points to the next slot to be returned (assuming * there's enough space on it for the request). It's defined as an int, * because it's updated without an exclusive lock. Uint16 would be more * appropriate, but int is more likely to be atomically * fetchable/storable. The fsm_search_avail () function attempts to spread the load over background processes by returning different pages to different background processes in a loop. * this field is defined as an integer because it does not require an exclusive lock. * unit16 may be more appropriate, but integers seem to be more suitable for atomic extraction and storage. * / int fp_next_slot; / * fp_nodes contains the binary tree, stored in array. The first * NonLeafNodesPerPage elements are upper nodes, and the following * LeafNodesPerPage elements are leaf nodes. Unused nodes are zero. * fp_nodes stores the binary tree as an array. * the first NonLeafNodesPerPage element is the node of the previous layer, and the next LeafNodesPerPage element is the leaf node. * the unused node is 0. * / uint8 fp_ Nodes [flex _ ARRAY_MEMBER];} FSMPageData;typedef FSMPageData * FSMPage
FSMLocalMap
For small tables, there is no need to create a FSM to store spatial information, using local memory mapping information.
/ * Either already tried, or beyond the end of the relation * / / has been tried or has been at the end of the table # define FSM_LOCAL_NOT_AVAIL 0x00 * Available to try * / / can be used to try # define FSM_LOCAL_AVAIL 0x01max * * For small relations, we don't create FSM to save space, instead we use * local in-memory map of pages to try. To locate free space, we simply try * pages directly without knowing ahead of time how much free space they have. * for small tables, there is no need to create a FSM to store space information, using local memory mapping information. In order to locate free space, we don't need to know how much free space they have, but simply try page. * Note that this map is used to the find the block with required free space * for any given relation. We clear this map when we have found a block with * enough free space, when we extend the relation, or on transaction abort. * See src/backend/storage/freespace/README for further details. * Note that this map is used to search the request free space for a given table. * when a block/ with enough free space is found to extend relation/ in a transaction rollback, the map is cleared. * for more information, please see src/backend/storage/freespace/README. * / typedef struct {number of BlockNumber nblocks;// blocks uint8 map [heap _ FSM_CREATION_THRESHOLD]; / Array} FSMLocalMap;static FSMLocalMap fsm_local_map = {0, {FSM_LOCAL_NOT_AVAIL}}; # define FSM_LOCAL_MAP_EXISTS (fsm_local_map.nblocks > 0) II. Source code interpretation
Fsm_set_and_search sets the corresponding value of the given FSM page and returns the appropriate slot.
The main processing logic is as follows:
1. Initialize related variables
two。 Set the FSM available tags for the corresponding page
3. If the search_cat/minvalue (request space) is not 0, call fsm_search_avail to search for slot
4. Unlock, return
The search tree algorithm is to gradually expand the "search triangle" (temporarily called search triangle), that is, all nodes covered by the current node ensure that we search to the right from the starting point. In the first step, only the target slot is checked. When we move up from the left child node to the parent node, we also add the right child tree of the parent node to the search triangle. When we move up from the right child node to the parent node, we also delete the current search triangle (we already know that there is no suitable page at this time) and retrieve the next larger triangle to the right. Therefore, instead of searching from the starting point to the left, we double the size of the search triangle at each step, making sure that only log2 (N) steps are needed to search N pages.
For example, consider the following tree:
7 7 6 5 7 6 54 5 5 7 2 6 5 2 T
Suppose the target node is the one indicated by the letter T, and we are searching for the node with a number of 6 or more. The search starts with T. In the first iteration, move to the right, then to the parent node, to the rightmost node 5. In the second iteration, move to the right, rewind, and then backtrack, reach node 7 at the third level, when 7 meets our search requirements, so we follow the path 7 to the bottom. In fact, this is the first qualified page to the right of the starting point (considering rollback).
/ * Set value in given FSM page and slot. * set the corresponding value of the given FSM page and return the appropriate slot. * * If minValue > 0, the updated page is also searched for a page with at * least minValue of free space. If one is found, its slot number is * returned,-1 otherwise. * if minValue is greater than 0, the updated page will also search for page with only a minimum free space. * if found, return slot number, otherwise return-1. * / search_slot = fsm_set_and_search (rel, addr, slot, old_cat, search_cat); static intfsm_set_and_search (Relation rel, FSMAddress addr, uint16 slot, uint8 newValue, uint8 minValue) {Buffer buf; Page page; int newslot =-1; / / get buffer buf = fsm_readbuf (rel, addr, true) of FSM / / Lock LockBuffer (buf, BUFFER_LOCK_EXCLUSIVE); / / get page page = BufferGetPage (buf) of FSM; if (fsm_set_avail (page, slot, newValue)) MarkBufferDirtyHint (buf, false) If (minValue! = 0) {/ * Search while we still hold the lock * / / search for slot newslot = fsm_search_avail (buf, minValue, addr.level = = FSM_BOTTOM_LEVEL, true);} UnlockReleaseBuffer (buf); return newslot;} / * * Searches for a slot with category at least minvalue. * Returns slot number, or-1 if none found. * search for slot with at least a minimum directory. * * The caller must hold at least a shared lock on the page, and this * function can unlock and lock the page again in exclusive mode if it * needs to be updated. Exclusive_lock_held should be set to true if the * caller is already holding an exclusive lock, to avoid extra work. * the caller must hold the page shared lock. If the page needs to be updated, the function may unlock or lock the page again in exclusive mode. * if the caller already has an exclusive lock, exclusive_lock_held should be set to T to avoid extra work. * * If advancenext is false, fp_next_slot is set to point to the returned * slot, and if it's true, to the slot after the returned slot. * if advancenext is FForce _ slot _ extents slot is set to point to the returned slot; * if T, point to the slot after the returned slot. * / newslot = fsm_search_avail (buf, minValue,\ addr.level = = FSM_BOTTOM_LEVEL,\ true); intfsm_search_avail (Buffer buf, uint8 minvalue, bool advancenext, bool exclusive_lock_held) {Page page = BufferGetPage (buf); / / get page FSMPage fsmpage = (FSMPage) PageGetContents (page) / / get FSMPage int nodeno; int target; uint16 slot;restart: / * * Check the root first, and exit quickly if there's no leaf with enough * free space * first check the root node, and quickly exit if the leaf node does not have enough free space. * / if (fsmpage- > fp_nodes [0])
< minvalue) return -1; /* * Start search using fp_next_slot. It's just a hint, so check that it's * sane. (This also handles wrapping around when the prior call returned * the last slot on the page.) * 使用fp_next_slot开始搜索. * 这个标记只是一个提示而已,检查它是否正常. * (这同时处理了在上一次调用返回的页面最后一个slot的回卷问题) */ //#define SlotsPerFSMPage LeafNodesPerPage //#define LeafNodesPerPage (NodesPerPage - NonLeafNodesPerPage) //#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \ offsetof(FSMPageData, fp_nodes)) //#define NonLeafNodesPerPage (BLCKSZ / 2 - 1) = 4095 target = fsmpage->Fp_next_slot; if (target
< 0 || target >= LeafNodesPerPage) target = 0; target + = NonLeafNodesPerPage; / *-* Start the search from the target slot. At every step, move one * node to the right, then climb up to the parent. Stop when we reach * a node with enough free space (as we must, since the root has enough * space). * search from the target slot. At each step, move the node to the right, and then trace the parent node up. * stop when you reach a node with enough free space (because the root node has enough space). * * The idea is to gradually expand our "search triangle", that is, all * nodes covered by the current node, and to be sure we search to the * right from the start point. At the first step, only the target slot * is examined. When we move up from a left child to its parent, we are * adding the right-hand subtree of that parent to the search triangle. * When we move right then up from a right child, we are dropping the * current search triangle (which we know doesn't contain any suitable * page) and instead looking at the next-larger-size triangle to its * right. So we never look left from our original start point, and at * each step the size of the search triangle doubles, ensuring it takes * only log2 (N) work to search N pages. * the idea of the algorithm is to gradually expand the "search triangle search triangle", that is, * all nodes covered by the current node ensure that we search to the right from the starting point. * in the first step, only the target slot will be checked. * when we move up from the left child node to the parent node, we also add the right child tree of the parent node to the search triangle. * when we move up from the right child node to the parent node, we also delete the current search triangle (we already know that there is no suitable page at this time), and * retrieve the next larger triangle to the right. * therefore, we don't search from the starting point to the left, and the size of the search triangle doubles at each step, ensuring that only log2 (N) steps are needed to search N pages. * * The "move right" operation will wrap around if it hits the right edge * of the tree, so the behavior is still good if we start near the right. * Note also that the move-and-climb behavior ensures that we can't end * up on one of the missing nodes at the right of the leaf level. * the "move right" operation rolls back when it touches the right edge of the tree, so there is no problem starting a search near the right side. * Note that the move-and-climb operation ensures that we cannot end on the node on the right side of the leaf layer. * * For example, consider this tree: * for example, consider the following tree: * * 7 * 7 6 * 5 7 6 5 * 4 5 5 7 2 6 5 5 2 * T * * Assume that the target node is the node indicated by the letter T, * and we're searching for a node with value of 6 or higher. The search * begins at T. At the first iteration, we move to the right, then to the * parent, arriving at the rightmost 5.At the second iteration, we move * to the right, wrapping around, then climb up, arriving at the 7 on the * third level. 7 satisfies our search, so we descend down to the bottom, * following the path of sevens. This is in fact the first suitable page * to the right of (allowing for wraparound) our start point. * assume that the target node is a node indicated by the letter T, and we are searching for nodes with a number of 6 or more. The search starts with T. * in the first iteration, move to the right, then to the parent node, to the rightmost node 5. In the second iteration, move to the right, rewind, * then backtrack, reach node 7 at the third level when 7 meets our search requirements, * so we follow the path 7 to the bottom. * in fact, this is the first qualified page to the right of the starting point (considering rollback). *-* / nodeno = target; while (nodeno > 0) {if (fsmpage- > fp_ [Nodeno] > = minvalue) break; / * * Move to the right, wrapping around on same level if necessary, then * climb up. * move to the right, rewind at the same level if necessary, and then backtrack. * / nodeno = parentof (rightneighbor (nodeno));} / * * We're now at a node with enough free space, somewhere in the middle of * the tree. Descend to the bottom, following a path with enough free * space, preferring to move left if there's a choice. * now we have reached the node with enough free space, somewhere in the middle of the tree. * descend to the bottom along the path with enough free space and, if selected, move to the left. * / while (nodeno
< NonLeafNodesPerPage) { //左树节点 int childnodeno = leftchild(nodeno); if (childnodeno < NodesPerPage && fsmpage->Fp_ Nodes [childnodeno] > = minvalue) {/ / if there is an opportunity, move nodeno = childnodeno; continue;} / / to the right tree node childnodeno++; / * point to right child * / if (childnodeno)
< NodesPerPage && fsmpage->Fp_ [childnodeno] > = minvalue) {/ / equivalent to a layer of nodeno = childnodeno;} else {/ * * Oops. The parent node promised that either left or right child * has enough space, but neither actually did. This can happen in * case of a "torn page", IOW if we crashed earlier while writing * the page to disk, and only part of the page made it to disk. * the parent node promises that there is enough space for either the left or right tree, but this is not the case. * this will happen in the case of "torn page", when the system crashes when writing page to disk, * only part of the page is written to disk. * * Fix the corruption and restart. * repair the damage and restart. * / RelFileNode rnode; ForkNumber forknum; BlockNumber blknum; / / get tag BufferGetTag (buf, & rnode, & forknum, & blknum); elog (DEBUG1, "fixing corrupt FSM block% u, relation% u/%u/%u", blknum, rnode.spcNode, rnode.dbNode, rnode.relNode) / * make sure we hold an exclusive lock * / / make sure to hold the exclusive lock if (! exclusive_lock_held) {LockBuffer (buf, BUFFER_LOCK_UNLOCK); LockBuffer (buf, BUFFER_LOCK_EXCLUSIVE); exclusive_lock_held = true } / / rebuild FSM fsm_rebuild_page (page); MarkBufferDirtyHint (buf, false); / / re-search goto restart;}} / * We're now at the bottom level, at a node with enough space. * / / has now reached the bottom, on a node with enough space. Slot = nodeno-NonLeafNodesPerPage; / * * Update the next-target pointer. Note that we do this even if we're only * holding a shared lock, on the grounds that it's better to use a shared * lock and get a garbled next pointer every now and then, than take the * concurrency hit of an exclusive lock. * update the next target block pointer. * Note: even if we only have a shared lock, we can do this. * for this reason, it is best to use a shared lock and get a disruptive next pointer from time to time. * this is better than holding an exclusive lock. * * Wrap-around is handled at the beginning of this function. * at the beginning of the function, the rollback has been processed. * / fsmpage- > fp_next_slot = slot + (advancenext? 1: 0); return slot;} III. Tracking analysis
Test script
15:54:13 (xdb@ [local]: 5432) testdb=# insert into T1 values
Start gdb and set breakpoint
(gdb) b fsm_set_and_searchBreakpoint 1 at 0x888850: file freespace.c, line 676. (gdb) cContinuing.Breakpoint 1, fsm_set_and_search (rel=0x7fd0d2f10788, addr=..., slot=5, newValue=0'\ 000mm, minValue=1'\ 001') at freespace.c:676676 int newslot =-1; (gdb)
Input parameters
(gdb) p * rel$1 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50820}, rd_smgr = 0x1233b00, rd_refcnt = 1, rd_backend =-1, rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 1'\ 001, rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7fd0d2f109a0, rd_att = 0x7fd0d2f10ab8, rd_id = 50820, rd_lockInfo = {relId = 50820, dbId = 16402}} Rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x7fd0d2f0f820, rd_oidindex = 0, rd_pkindex = 0, rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = rd_pkattr, 0x0 = 0x0 Rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0, rd_indextuple = 0x0, rd_amhandler = 0, rd_indexcxt = 0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_amcache = rd_amcache, 0x0 = 0x0, 0x0 = rd_indcollation Rd_toastoid = 0, pgstat_info = 0x12275f0} (gdb) (gdb) p addr$2 = {level = 0, logpageno = 0}
Get the buffere and lock
(gdb) n678 buf = fsm_readbuf (rel, addr, true); (gdb) 679 LockBuffer (buf, BUFFER_LOCK_EXCLUSIVE); (gdb) p buf$3 = 105 (gdb)
Get page
(gdb) p page$4 = (Page) 0x7fd0bf41dc00 "" (gdb) p * page$5 = 0'\ 000'
Call the fsm_search_avail function to enter fsm_search_avail
(gdb) n684 MarkBufferDirtyHint (buf, false); (gdb) 686 if (minValue! = 0) (gdb) 690 addr.level = = FSM_BOTTOM_LEVEL, (gdb) step689 newslot = fsm_search_avail (buf, minValue, (gdb) fsm_search_avail (buf=105, minvalue=1'\ 001), advancenext=true, exclusive_lock_held=true) at fsmpage.c:161161 Page page = BufferGetPage (buf); (gdb)
Gets the FSMPage,fp_nodes array. 0 means it is not used.
(gdb) n162 FSMPage fsmpage = (FSMPage) PageGetContents (page); (gdb) p page$6 = (Page) 0x7fd0bf41dc00 "" (gdb) n173 if (fsmpage- > fp_nodes [0])
< minvalue)(gdb) p *fsmpage$7 = {fp_next_slot = 6, fp_nodes = 0x7fd0bf41dc1c "{{"}(gdb) p *fsmpage->Fp_nodes$8 = 123'{'(gdb) p fsmpage- > fp_nodes [0] $9 = 123'{'(gdb) p fsmpage- > fp_nodes [1] $10 = 123'{'(gdb) p fsmpage- > fp_nodes [2] $11 = 0'\ 000' (gdb) p fsmpage- > fp_nodes [3] $12 = 123'{(gdb) p fsmpage- > fp_nodes [4] $13 = 0'\ 000' (gdb) p fsmpage- > fp_nodes [5] $14 = 0'\ 000' ( Gdb) p fsmpage- > fp_nodes [6] $15 = 0'\ 000'
Use fp_next_slot to start the search.
Start the search from the target slot. At each step, move the node to the right, and then trace the parent node up.
Stop when you reach a node with enough free space (because the root node has enough space).
(gdb) n181 target = fsmpage- > fp_next_slot; (gdb) 182 if (target)
< 0 || target >= LeafNodesPerPage) (gdb) p target$16 = 6 (gdb) p LeafNodesPerPageNo symbol "_ builtin_offsetof" in current context. (gdb) n184 target + = NonLeafNodesPerPage; (gdb) 227nodeno = target; (gdb) p target$17 = 4101 (gdb)
Loop until the node that meets the condition is found.
The method is to move to the right, scroll back at the same level if necessary, and then backtrack.
(gdb) p fsmpage- > fp_nodes [nodeno] $19 = 0'\ 000' (gdb) p minvalue$20 = 1'\ 001' (gdb) n237 nodeno = parentof (rightneighbor (nodeno)); (gdb) n228 while (nodeno > 0) (gdb) p nodeno$21 = 2050 (gdb) n230 if (fsmpage- > fp_ nodesnodeno > = minvalue) (gdb) fsmpage- > fp_ nodesnodeno Undefined command: "fsmpage- > fp_nodes". Try "help". (gdb) p fsmpage- > fp_nodes [nodeno] $22 = 0'\ 000' (gdb) n237 nodeno = parentof (rightneighbor (nodeno)); (gdb) 228 while (nodeno > 0) (gdb) 230if (fsmpage- > fp_ [nodeno] > = minvalue) (gdb) p nodeno$23 = 1025 (gdb) n231 break; (gdb) p fsmpage- > fp_nodes [nodeno] $24 = 1'\ 001' (gdb)
Now we have reached the node with enough free space, somewhere in the middle of the tree.
Follow the path where there is enough free space to the bottom.
If you have a choice, move to the left. In this case, it moves to the left.
(gdb) n245 while (nodeno
< NonLeafNodesPerPage)(gdb) p nodeno$25 = 1025(gdb) n247 int childnodeno = leftchild(nodeno);(gdb) 249 if (childnodeno < NodesPerPage &&(gdb) p childnodeno$26 = 2051(gdb) n250 fsmpage->Fp_ fp_ [childnodeno] > = minvalue) (gdb) p fsmpage- > fp_nodes [childnodeno] $27 = 1'\ 001' (gdb) n249 if (childnodeno)
< NodesPerPage &&(gdb) 252 nodeno = childnodeno;(gdb) 253 continue;(gdb) 找到了相应的叶子节点 (gdb) 245 while (nodeno < NonLeafNodesPerPage)(gdb) n247 int childnodeno = leftchild(nodeno);(gdb) 249 if (childnodeno < NodesPerPage &&(gdb) 250 fsmpage->Fp_ Nodes [childnodeno] > = minvalue) (gdb) 249 if (childnodeno)
< NodesPerPage &&(gdb) 255 childnodeno++; /* point to right child */(gdb) 256 if (childnodeno < NodesPerPage &&(gdb) p childnodeno$28 = 4104(gdb) n257 fsmpage->Fp_ Nodes [childnodeno] > = minvalue) (gdb) 256 if (childnodeno
< NodesPerPage &&(gdb) 259 nodeno = childnodeno;(gdb) p childnodeno$29 = 4104(gdb) n245 while (nodeno < NonLeafNodesPerPage)(gdb) 现在已到底层,在有足够空间的节点上. 同时,更新下一个目标块指针. (gdb) 293 slot = nodeno - NonLeafNodesPerPage;(gdb) n303 fsmpage->Fp_next_slot = slot + (advancenext? 1: 0); (gdb) p slot$30 = 9 (gdb) p nodeno$31 = 4104 (gdb) n305 return slot; (gdb) 306} (gdb)
Go back to fsm_set_and_search, return to slot
(gdb) fsm_set_and_search (rel=0x7fd0d2f10788, addr=..., slot=5, newValue=0'\ 000songs, minValue=1'\ 001') at freespace.c:694694 UnlockReleaseBuffer (buf); (gdb) 696 return newslot; (gdb) (gdb) p newslot$32 = 9 (gdb)
DONE!
IV. Reference materials
PG Source Code
Database Cluster, Databases, and Tables
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.