Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the partition table of PostgreSQL

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains how to understand the partition table of PostgreSQL. Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand the partition table of PostgreSQL.

In PG, partitioned tables are implemented through inheritance, so there is a problem here, that is, when inserting data, how does PG determine which target partition the data should be inserted into? In PG, the main purpose of preparing for routing tuples to be inserted through the function ExecPrepareTupleRouting is to determine the partition in which the tuples are located.

I. data structure

ModifyTable

ModifyTable Node

Apply the rows generated by the subplan to the result table by inserting, updating, or deleting.

/ *-* ModifyTable node-* Apply rows produced by subplan (s) to result table (s), * by inserting, updating, or deleting. * apply the rows generated by the subplan to the result table by inserting, updating, or deleting. * * If the originally named target table is a partitioned table, both * nominalRelation and rootRelation contain the RT index of the partition * root, which is not otherwise mentioned in the plan. Otherwise rootRelation * is zero. However, nominalRelation will always be set, as it's the rel that * EXPLAIN should claim is the INSERT/UPDATE/DELETE target. * if the originally named target table is a partitioned table, both nominalRelation and rootRelation contain the RT index of the partitioned root, which is not mentioned separately in the plan. * otherwise, the root relationship is zero. However, a nominal relationship is always set, nominalRelation because the rel that EXPLAIN should declare is an INSERT/UPDATE/DELETE target relationship. * * Note that rowMarks and epqParam are presumed to be valid for all the * subplan (s); they can't contain any info that varies across subplans. * Note that rowMarks and epqParam are assumed to be valid for all subplans; * they cannot contain any information that changes in the subplan. *-* / typedef struct ModifyTable {Plan plan; CmdType operation; / * operation type; do you need to set tag?do we set the command tag/es_processed for INSERT, UPDATE, or DELETE * / bool canSetTag; / *? * / Index nominalRelation; / * for the parent RT index of EXPLAIN Parent RT index for use of EXPLAIN * / Index rootRelation; / * Root Root RT index (if the target is a partitioned table); Root RT index, if target is partitioned * / bool partColsUpdated; / * updated the partitioning keyword in the hierarchy; some part key in hierarchy updated * / List * resultRelations; / * RT index integer linked list; integer list of RT indexes * / int resultRelIndex / * Index of the first resultRel in the planned linked list; index of first resultRel in plan's list * / int rootResultRelIndex; / * partitioned table root index; index of the partitioned table root * / List * plans; / * generate the planned linked list of the source data; plan (s) producing source data * / List * withCheckOptionLists; / * the WCO linked list that each target table has Per-target-table WCO lists * / List * returningLists; / * RETURNING linked list for each target table; per-target-table RETURNING tlists * / List * fdwPrivLists; / * FDW private data linked list for each target table; per-target-table FDW private data lists * / Bitmapset * fdwDirectModifyPlans; / * FDW DM plan index bitmap; indices of FDW DM plans * / List * rowMarks; / * rowMarks linked list PlanRowMarks (non-locking only) * / int epqParam; / * EvalPlanQual parses the parameter ID;ID of Param for EvalPlanQual re-eval * / OnConflictAction onConflictAction; / * ON CONFLICT action * / List * arbiterIndexes; / * conflict arbiter index table; List of ON CONFLICT arbiter index OIDs * / List * onConflictSet; / * SET for INSERT ON CONFLICT DO UPDATE * / Node * onConflictWhere / * WHERE for ON CONFLICT UPDATE * / Index exclRelRTI; / * RTI of the EXCLUDED pseudo relation * / List * exclRelTlist; / * projection list with pseudo relations excluded; tlist of the EXCLUDED pseudo relation * /} ModifyTable

ResultRelInfo

ResultRelInfo structure

Whenever we update an existing relationship, we must update the index on the relationship and perhaps trigger the trigger. ResultRelInfo holds all the information needed about the result relationship, including the index.

/ * * ResultRelInfo * ResultRelInfo structure * * Whenever we update an existing relation, we have to update indexes on the * relation, and perhaps also fire triggers. ResultRelInfo holds all the * information needed about a result relation, including indexes. * whenever we update an existing relationship, we must update the index on the relationship and perhaps trigger the trigger. * ResultRelInfo keeps all the information needed about the result relationship, including the index. * * Normally, a ResultRelInfo refers to a table that is in the query's * range table; then ri_RangeTableIndex is the RT index and ri_RelationDesc * is just a copy of the relevant es_relations [] entry. But sometimes, * in ResultRelInfos used only for triggers, ri_RangeTableIndex is zero * and ri_RelationDesc is a separately-opened relcache pointer that needs * to be separately closed. See ExecGetTriggerResultRel. * generally, ResultRelInfo refers to the table in the query scope table; * ri_RangeTableIndex is the RT index, and ri_RelationDesc is just a copy of the related es_relations [] entries. * but sometimes, in ResultRelInfos used only for triggers, ri_RangeTableIndex is zero (NULL), and ri_RelationDesc is a relcache pointer that needs to be closed and opened separately. * for more information, please see ExecGetTriggerResultRel structure. * / typedef struct ResultRelInfo {NodeTag type; / * result relation's range table index, or 0 if not in range table * / / RTE index Index ri_RangeTableIndex; / * relation descriptor for result relation * / / result / descriptor of the target relation Relation ri_RelationDesc; / * # of indices existing on result relation * / / number of indexes in the target relationship int ri_NumIndices / * array of relation descriptors for indices * / / the relational descriptor array of the index (the index is regarded as a relation) RelationPtr ri_IndexRelationDescs; / * array of key/attr info for indices * / / the key / attribute array of the index IndexInfo * * ri_IndexRelationInfo; / * triggers to be fired, if any * / / triggered index TriggerDesc * ri_TrigDesc / * cached lookup info for trigger functions * / / trigger function (cache) FmgrInfo * ri_TrigFunctions; / * array of trigger WHEN expr states * / / trigger array of WHEN expression status ExprState * * ri_TrigWhenExprs; / * optional runtime measurements for triggers * / / optional trigger runtime metric Instrumentation * ri_TrigInstrument / * FDW callback functions, if foreign table * / / FDW callback function struct FdwRoutine * ri_FdwRoutine; / * available to save private state of FDW * / / can be used to store the private status of FDW void * ri_FdwState; / * true when modifying foreign table directly * / T bool ri_usesFdwDirectModify when updating FDW directly / * list of WithCheckOption's to be checked * / / WithCheckOption linked list List * ri_WithCheckOptions; / * list of WithCheckOption expr states * / / WithCheckOption expression linked list List * ri_WithCheckOptionExprs; / * array of constraint-checking expr states * / / constraint check expression status array ExprState * * ri_ConstraintExprs / * for removing junk attributes from tuples * / / used to remove junk attributes from the tuple JunkFilter * ri_junkFilter; / * list of RETURNING expressions * / / RETURNING expression linked list List * ri_returningList; / * for computing a RETURNING list * / / used to calculate RETURNING linked list ProjectionInfo * ri_projectReturning / * list of arbiter indexes to use to check conflicts * / / list used to check conflicting arbitrator indexes List * ri_onConflictArbiterIndexes; / * ON CONFLICT evaluation state * / / ON CONFLICT parsing status OnConflictSetState * ri_onConflict; / * partition check expression * / / Partition check expression linked list List * ri_PartitionCheck / * partition check expression state * / / Partition check expression status ExprState * ri_PartitionCheckExpr; / * relation descriptor for root partitioned table * / / Partition root root table descriptor Relation ri_PartitionRoot; / * Additional information specific to partition tuple routing * / additional partition tuple routing information struct PartitionRoutingInfo * ri_PartitionInfo;} ResultRelInfo

PartitionRoutingInfo

PartitionRoutingInfo structure

Partition routing information, which is used to route tuples to the result relationship information of the table partition.

/ * * PartitionRoutingInfo * PartitionRoutingInfo-Partition routing information * * Additional result relation information specific to routing tuples to a * table partition. * result relationship information used to route tuples to table partitions. * / typedef struct PartitionRoutingInfo {/ * * Map for converting tuples in root partitioned table format into * partition format, or NULL if no conversion is required. * mapping, which is used to convert tuples in root partition table format to partition format, or to NULL if no conversion is required. * / TupleConversionMap * pi_RootToPartitionMap; / * * Map for converting tuples in partition format into the root partitioned * table format, or NULL if no conversion is required. * mapping, which is used to convert tuples in partition format to root partition table format, or to NULL if no conversion is required. * / TupleConversionMap * pi_PartitionToRootMap; / * * Slot to store tuples in partition format, or NULL when no translation * is required between root and partition. * store the slot of the tuple in partition format. NULL when no conversion is required between the root partition and the partition. * / TupleTableSlot * pi_PartitionTupleSlot;} PartitionRoutingInfo

TupleConversionMap

TupleConversionMap structure, which is used to store tuple transformation mapping information.

Typedef struct TupleConversionMap {TupleDesc indesc; / * descriptor of source line type; descriptor of tupdesc for source rowtype * / TupleDesc outdesc; / * result line type; index information of tupdesc for result rowtype * / AttrNumber * attrMap; / * input field. 0 represents NULL;indexes of input fields, or 0 for null * / Datum * invalues; / * destructs the workspace of source data Whether workspace for deconstructing source * / bool * inisnull; / / is the workspace of the construction result of NULL tag array Datum * outvalues; / *; workspace for constructing result * / bool * outisnull; / / null tag} TupleConversionMap; II. Source code interpretation

The ExecPrepareTupleRouting function determines the partition to which the tuple to be inserted into the slot belongs, and modifies relevant information such as mtstate and estate to prepare for subsequent actual insertions.

/ * * ExecPrepareTupleRouting-prepare for routing one tuple * ExecPrepareTupleRouting-prepare to route a tuple * * Determine the partition in which the tuple in slot is to be inserted, * and modify mtstate and estate to prepare for it. * make sure you want to insert the partition of tuple in slot, and modify mtstate and estate to prepare for insertion. * * Caller must revert the estate changes after executing the insertion! * In mtstate, transition capture changes may also need to be reverted. * the caller must restore the modified property values in the estate after the insert! * in mtstate, the transformation captures changes may also need to be restored. * * Returns a slot holding the tuple of the partition rowtype. * returns the slot that contains the partition rowtype tuple. * / static TupleTableSlot * ExecPrepareTupleRouting (ModifyTableState * mtstate, EState * estate, PartitionTupleRouting * proute, ResultRelInfo * targetRelInfo, TupleTableSlot * slot) {ModifyTable * node;//ModifyTable node int partidx;// partition index ResultRelInfo * partrel;//ResultRelInfo structure pointer (array) HeapTuple tuple / / tuple / * * Determine the target partition. If ExecFindPartition does not find a * partition after all, it doesn't return here; otherwise, the returned * value is to be used as an index into the arrays for the ResultRelInfo * and TupleConversionMap for the partition. * determine the target partition. * if ExecFindPartition finally does not find the partition, it will not return here; * otherwise, the return value will be used as the index of the partition's ResultRelInfo and TupleConversionMap arrays. * / partidx = ExecFindPartition (targetRelInfo, proute- > partition_dispatch_info, slot, estate); Assert (partidx > = 0 & & partidx

< proute->

< proute->

Num_partitions); (gdb) p partidx$1 = 2

Gets the ResultRelInfo; corresponding to the selected partition. If not, initialize the

(gdb) n1722 partrel = proute- > partitions [partidx]; (gdb) 1723 if (partrel = = NULL) (gdb) p * partrelCannot access memory at address 0x0 (gdb) n1724 partrel = ExecInitPartitionInfo (mtstate, targetRelInfo)

Initialized partrel

(gdb) p * partrel$2 = {type = T_ResultRelInfo, ri_RangeTableIndex = 1, ri_RelationDesc = 0x1e7c940, ri_NumIndices = 0, ri_IndexRelationDescs = 0x0, ri_IndexRelationInfo = 0x0, ri_TrigDesc = 0x0, ri_TrigFunctions = 0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_WithCheckOptions = 0x0, ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_junkFilter = 0x0, ri_returningList = 0x0, 0x0 = ri_projectReturning, ri_projectReturning = ri_projectReturning Ri_onConflict = 0x0, ri_PartitionCheck = 0x1e4f538, ri_PartitionCheckExpr = 0x0, ri_PartitionRoot = 0x1e7c2f8, ri_PartitionReadyForRouting = true}

Target partition descriptor-- > t_hash_partition_3

(gdb) p * partrel- > ri_RelationDesc$3 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 16995}, rd_smgr = 0x1e34510, rd_refcnt = 1, rd_backend =-1, rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 0000, rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x1e7c1e0, rd_att = 0x1e7cb58, rd_id = 16995, rd_lockInfo = {lockRelId = {relId = 16995 DbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, rd_partdesc = 0x0, rd_partcheck = 0x1e7aa30, rd_indexlist = 0x0, rd_oidindex = 0, rd_pkindex = 0, rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = rd_pkattr Rd_idattr = 0x0, rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0, rd_indextuple = 0x0, rd_amhandler = 0, rd_indexcxt = 0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = rd_exclstrats, 0x0 = 0x0, 0x0 = rd_amcache Rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x1de40b0}-testdb=# select oid,relname from pg_class where oid=16995 Oid | relname-+-16995 | t_hash_partition_3 (1 row)-

The partition is routable

(gdb) p partrel- > ri_PartitionReadyForRouting$4 = true

Set the estate variable (to make it look like it is inserted into the partition) / materialize tuple

(gdb) n1751 estate- > es_result_relation_info = partrel; (gdb) 1754 tuple = ExecMaterializeSlot (slot) (gdb) 1760 if (mtstate- > mt_transition_capture! = NULL) (gdb) p tuple$5 = (HeapTuple) 0x1e4f4e0 (gdb) p * tuple$6 = {t_len = 40, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_tableOid = 0, t_data = 0x1e4f4f8} (gdb) p * tuple- > t_data$7 = {t_choice = {t_heap = {t_xmin = 65535, t_xmax = 4294967295 T_field3 = {t_cid = 2249, t_xvac = 2249}}, t_datum = {datum_len_ = 2249, datum_typmod =-1, datum_typeid = 2249}}, t_ctid = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_infomask2 = 3, t_infomask = 2, t_hoff = 24'\ 0303, t_bits = 0x1e4f50f ""}

Mtstate- > mt_transition_capture is NULL, so you don't need to process related information.

(gdb) p mtstate- > mt_transition_capture $8 = (struct TransitionCaptureState *) 0x01783 if (mtstate- > mt_oc_transition_capture! = NULL) (gdb)

Convert tuples if necessary

1792 ConvertPartitionTupleSlot (proute- > parent_child_tupconv_maps [partidx], (gdb) 1798 Assert (mtstate! = NULL); (gdb) 1799 node = (ModifyTable *) mtstate- > ps.plan (gdb) p * mtstate$9 = {ps = {type = T_ModifyTableState, plan = 0x1e59838, state = 0x1e4daf8, ExecProcNode = 0x711056, ExecProcNodeReal = 0x711056, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleSlot = 0x1e4ede8, ps_ExprContext = 0x0, ps_ProjInfo = ps_ProjInfo, 0x0 = 0x0}, 0x0 = 0x0, scandesc = scandesc, 0x0 =, = Mt_nplans = 1, mt_whichplan = 0, resultRelInfo = 0x1e4dd48, rootResultRelInfo = 0x0, mt_arowmarks = 0x1e4e098, mt_epqstate = {estate = 0x0, planstate = 0x0, origslot = 0x1e4e4e0, plan = 0x1e59588, arowMarks = 0x0, epqParam = 0}, fireBSTriggers = false, mt_existing = 0x0, mt_excludedtlist = 0x0, mt_conflproj = 0x0, mt_partition_tuple_routing = 0x1e4eb48, mt_transition_capture = 0x0, mt_oc_transition_capture = 0x0, mt_per_subplan_tupconv_maps = 0x0}

Return slot to complete the call

(gdb) n1800 if (node- > onConflictAction = = ONCONFLICT_UPDATE) (gdb) 1810 return slot; (gdb) 1811} so far, I believe you have a deeper understanding of "how to understand PostgreSQL's partition table". You might as well do it in practice! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report