Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

PostgreSQL source code interpretation (72)-query statement # 57 (make_one_rel function # 22 muri.

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This section generally introduces the implementation of genetic algorithm (geqo function). When the number of participating connections is greater than or equal to 12 (default value), PG uses genetic algorithm to generate connection access path to build the final connection relationship.

Brief introduction of genetic algorithm

Genetic algorithm is a search algorithm based on bioscience. Some bioscience related knowledge is used in this algorithm. Here are some terms used in PG genetic algorithm:

1. Chromosomes (Chromosome): chromosomes can also be called individual genotypes (individuals). A chromosome can be regarded as a solution (a legal connection access path).

2. Pool: a certain number of individuals (chromosomes) form a population (pool/population), and the number of individuals in a population is called population size (population size).

3. Gene: a gene is an element in the chromosome, which is used to express the characteristics of the individual. For example, if there is a string (that is, chromosome) Sylv1011, then the four elements of the string (chromosome) Song1011 are called genes, respectively. In PG, genes are involved in the connection.

4. Fitness: the degree of adaptation of each individual to the environment is called fitness (fitness). In order to reflect the adaptability of chromosomes, a function which can measure each chromosome in the problem is introduced, which is called fitness function. This function is usually used to calculate the probability of an individual being used in a group. In PG, fitness is the total cost of connecting access paths.

Data structure / * * Private state for a GEQO run-accessible via root- > join_search_private * / typedef struct {List * initial_rels; / * join relational linked list; the base relations we are joining * / unsigned short random_state [3]; / * unsigned short integer array (random number); state for pg_erand48 () * /} GeqoPrivateData;/* we presume that int instead of Relid is o.k. For Gene; so don't change it! * / typedef int Gene;// gene (Integer) typedef struct Chromosome// chromosome {Gene * string;// gene Cost worth;// cost} Chromosome;typedef struct Pool// population {Chromosome * data;// chromosome array int size;// size int string_length;// length} Pool / * A "clump" of already-joined relations within gimme_tree * / typedef struct {RelOptInfo * joinrel; / * joinrel for the set of relations * / int size; / * number of input relations in clump * /} Clump; II. Source code interpretation

The geqo function implements the genetic algorithm and constructs the connection access path of multi-table (≥ 12).

/ /-geqo/* * geqo * solution of the query optimization problem * similar to a constrained Traveling Salesman Problem (TSP) * genetic algorithm: please refer to TSP's algorithm. * TSP- travel salesman problem (shortest path problem): * given the distance between a series of cities and each pair of cities, solve the shortest circuit to visit each city once and return to the starting city. * / RelOptInfo * geqo (PlannerInfo * root, int number_of_rels, List * initial_rels) {GeqoPrivateData private;// genetic algorithm private data, including joining relationships and random numbers int generation; Chromosome * momma;// chromosomes-mother array Chromosome * daddy;// chromosomes-father array Chromosome * kid;// chromosomes-children array Pool * pool / / population pointer int pool_size,// population size number_generations;// evolutionary algebra, using the maximum number of iterations (evolutionary algebra) as the stopping criterion # ifdef GEQO_DEBUG int status_interval;#endif Gene * best_tour; RelOptInfo * best_rel;// optimal solution # if defined (ERX) Edge * edge_table; / * boundary linked list List of edges * / int edge_failures = 0transferable difhands if defined (CX) | | defined (PX) | | defined (OX1) | | defined (OX2) City * city_table; / * City linked list; list of cities * / # endif#if defined (CX) int cycle_diffs = 0; int mutations = 0 * configure private information Set up private information * / root- > join_search_private = (void *) & private; private.initial_rels = initial_rels;/* initialization seed value; initialize private number generator * / geqo_set_seed (root, Geqo_seed); / * set genetic algorithm parameters; set GA parameters * / pool_size = gimme_pool_size (number_of_rels); / / population size number_generations = gimme_number_generations (pool_size) / / number of iterations # ifdef GEQO_DEBUG status_interval = 10 Bitschdif * apply for memory; allocate genetic pool memory * / pool = alloc_pool (root, pool_size, number_of_rels); / * Random initialize the population; random initialization of the pool * / random_init_pool (root, pool); / * sort the population, keep the lowest cost; sort the pool according to cheapest path as fitness * / sort_pool (root, pool) / * we have to do it only one time, since all * kids replace the worst individuals in * future (- > geqo_pool.c:spread_chromo) * / # ifdef GEQO_DEBUG elog (DEBUG1, "GEQO selected% d pool entries, best% .2f, worst% .2f", pool_size, pool- > data [0] .worth Pool- > data [pool _ size-1] .worth) # endif/* application chromosome memory (mother & father); allocate chromosome momma and daddy memory * / momma = alloc_chromo (root, pool- > string_length); daddy = alloc_chromo (root, pool- > string_length); # if defined (ERX) # ifdef GEQO_DEBUG elog (DEBUG2, "using edge recombination crossover [ERX]"); # endif/* allocate edge table memory * / / Application boundary table memory edge_table = alloc_edge_table (root, pool- > string_length) # elif defined (PMX) # ifdef GEQO_DEBUG elog (DEBUG2, "using partially matched crossover [PMX]"); # endif/* application for child chromosome memory; allocate chromosome kid memory * / kid = alloc_chromo (root, pool- > string_length); # elif defined (CX) # ifdef GEQO_DEBUG elog (DEBUG2, "using cycle crossover [CX]"); # endif/* allocate city table memory * / kid = alloc_chromo (root, pool- > string_length) City_table = alloc_city_table (root, pool- > string_length); # elif defined (PX) # ifdef GEQO_DEBUG elog (DEBUG2, "using position crossover [PX]"); # endif/* allocate city table memory * / kid = alloc_chromo (root, pool- > string_length); / / Application memory city_table = alloc_city_table (root, pool- > string_length); # elif defined (OX1) # ifdef GEQO_DEBUG elog (DEBUG2, "using order crossover [OX1]") # endif/* allocate city table memory * / kid = alloc_chromo (root, pool- > string_length); city_table = alloc_city_table (root, pool- > string_length); # elif defined (OX2) # ifdef GEQO_DEBUG elog (DEBUG2, "using order crossover [OX2]"); # endif/* allocate city table memory * / kid = alloc_chromo (root, pool- > string_length); city_table = alloc_city_table (root, pool- > string_length) # endif/* my pain main part: * / / * iterative optimization. Optimized optimization * / for (generation = 0; generation)

< number_generations; generation++)//开始迭代 { /* SELECTION: using linear bias function */ //选择:利用线性偏差(bias)函数,从中选出momma&daddy geqo_selection(root, momma, daddy, pool, Geqo_selection_bias);#if defined (ERX) /* EDGE RECOMBINATION CROSSOVER */ //交叉遗传 gimme_edge_table(root, momma->

String, daddy- > string, pool- > string_length, edge_table); kid = momma; / * are there any edge failures? * / / traversing the boundary edge_failures + = gimme_tour (root, edge_table, kid- > string, pool- > string_length); # elif defined (PMX) / * PARTIALLY MATCHED CROSSOVER * / pmx (root, momma- > string, daddy- > string, kid- > string, pool- > string_length) # elif defined (CX) / * CYCLE CROSSOVER * / cycle_diffs = cx (root, momma- > string, daddy- > string, kid- > string, pool- > string_length, city_table); / * mutate the child * / if (cycle_diffs = = 0) {mutations++; geqo_mutation (root, kid- > string, pool- > string_length) } # elif defined (PX) / * POSITION CROSSOVER * / px (root, momma- > string, daddy- > string, kid- > string, pool- > string_length, city_table); # elif defined (OX1) / * ORDER CROSSOVER * / ox1 (root, momma- > string, daddy- > string, kid- > string, pool- > string_length, city_table) # elif defined (OX2) / * ORDER CROSSOVER * / ox2 (root, momma- > string, daddy- > string, kid- > string, pool- > string_length, city_table); # endif / * EVALUATE FITNESS * / / calculated fitness kid- > worth = geqo_eval (root, kid- > string, pool- > string_length) / * push the kid into the wilderness of life according to its worth * / / put the inherited chromosomes into the wild for the next round of evolution spread_chromo (root, kid, pool); # ifdef GEQO_DEBUG if (status_interval &! (generation% status_interval)) print_gen (stdout, pool, generation) # endif} # if defined (ERX) & & defined (GEQO_DEBUG) if (edge_failures! = 0) elog (LOG, "[GEQO] failures:% d, average:% d", edge_failures, (int) number_generations / edge_failures); else elog (LOG, "[GEQO] no edge failures detected") # endif#if defined (CX) & & defined (GEQO_DEBUG) if (mutations! = 0) elog (LOG, "[GEQO] mutations:% d, generations:% d", mutations, number_generations); else elog (LOG, "[GEQO] no mutations processed"); # endif#ifdef GEQO_DEBUG print_pool (stdout, pool, 0, pool_size-1) # endif#ifdef GEQO_DEBUG elog (DEBUG1, "GEQO best is% .2f after% d generations", pool- > data [0] .worth, number_generations); # endif / * * got the cheapest query tree processed by geqo; first element of the * population indicates the best query tree * / best_tour = (Gene *) pool- > data [0] .string; best_rel = gimme_tree (root, best_tour, pool- > string_length) If (best_rel = = NULL) elog (ERROR, "geqo failed to make a valid plan"); / * DBG: show the query plan * / # ifdef NOT_USED print_plan (best_plan, root); # endif / *. Free memory stuff * / free_chromo (root, momma); free_chromo (root, daddy); # if defined (ERX) free_edge_table (root, edge_table); # elif defined (PMX) free_chromo (root, kid); # elif defined (CX) free_chromo (root, kid); free_city_table (root, city_table); # elif defined (PX) free_chromo (root, kid); free_city_table (root, city_table) # elif defined (OX1) free_chromo (root, kid); free_city_table (root, city_table); # elif defined (OX2) free_chromo (root, kid); free_city_table (root, city_table); # endif free_pool (root, pool); / *. Clear root pointer to our private storage * / root- > join_search_private = NULL; return best_rel;} / /-geqo_pool.cstatic int compare (const void * arg1, const void * arg2) / * * alloc_pool * allocates memory for GA pool * / Pool * alloc_pool (PlannerInfo * root, int pool_size, int string_length) {Pool * new_pool; Chromosome * chromo; int i; / * pool * / new_pool = (Pool *) palloc (sizeof (Pool)); new_pool- > size = (int) pool_size; new_pool- > string_length = (int) string_length / * all chromosome * / new_pool- > data = (Chromosome *) palloc (pool_size * sizeof (Chromosome)); / * all gene * / chromo = (Chromosome *) new_pool- > data; / * vector of all chromos * / for (I = 0; I

< pool_size; i++) chromo[i].string = palloc((string_length + 1) * sizeof(Gene)); return new_pool;}/* * free_pool * deallocates memory for GA pool */voidfree_pool(PlannerInfo *root, Pool *pool){ Chromosome *chromo; int i; /* all gene */ chromo = (Chromosome *) pool->

Data; / * vector of all chromos * / for (I = 0; I

< pool->

Size; iTunes +) pfree (chromo.string); / * all chromosome * / pfree (pool- > data); / * pool * / pfree (pool);} / * random_init_pool * initialize genetic pool * / voidrandom_init_pool (PlannerInfo * root, Pool * pool) {Chromosome * chromo = (Chromosome *) pool- > data; int i; int bad = 0 / * We immediately discard any invalid individuals (those that geqo_eval * returns DBL_MAX for), thereby not wasting pool space on them. * immediately discard all invalid individuals (those whose geqo_eval returns DBL_MAX), so that memory space is not wasted on them. * * If we fail to make any valid individuals after 10000 tries, give up; * this probably means something is broken, and we shouldn't just let * ourselves get stuck in an infinite loop. * if no valid individuals are produced after 10000 attempts, then giving up is the best option; * this may mean that there is a problem and therefore should not fall into a dead cycle. * / I = 0; while (I

< pool->

Size) {init_tour (root, chromo [I] .string, pool- > string_length); pool- > Data.worth = geqo_eval (root, chromo.string, pool- > string_length); if (pool- > Data.worth)

< DBL_MAX) i++; else { bad++; if (i == 0 && bad >

= 10000) elog (ERROR, "geqo failed to make a valid plan");}} # ifdef GEQO_DEBUG if (bad > 0) elog (DEBUG1, "% d invalid tours found while selecting% d pool entries", bad, pool- > size) # endif} / * * sort_pool * sorts input pool according to worth, from smallest to largest * * maybe you have to change compare () for different ordering. * / voidsort_pool (PlannerInfo * root, Pool * pool) {qsort (pool- > data, pool- > size, sizeof (Chromosome), compare);} / * compare * qsort comparison function for sort_pool * / static intcompare (const void * arg1, const void * arg2) {const Chromosome * chromo1 = (const Chromosome *) arg1 Const Chromosome * chromo2 = (const Chromosome *) arg2; if (chromo1- > worth = = chromo2- > worth) return 0; else if (chromo1- > worth > chromo2- > worth) return 1; else return-1;} / * alloc_chromo * allocates a chromosome and string space * / Chromosome * alloc_chromo (PlannerInfo * root, int string_length) {Chromosome * chromo; chromo = (Chromosome *) palloc (sizeof (Chromosome)) Chromo- > string = (Gene *) palloc ((string_length + 1) * sizeof (Gene)); return chromo;} / * free_chromo * deallocates a chromosome and string space * / voidfree_chromo (PlannerInfo * root, Chromosome * chromo) {pfree (chromo- > string); pfree (chromo) } / * spread_chromo * inserts a new chromosome into the pool, displacing worst gene in pool * assumes best- > worst = smallest- > largest * / voidspread_chromo (PlannerInfo * root, Chromosome * chromo, Pool * pool) {int top, mid, bot; int i, index; Chromosome swap_chromo, tmp_chromo / * new chromo is so bad we can't use it * / if (chromo- > worth > pool- > data [pool-> size-1] .worth) return; / * do a binary search to find the index of the new chromo * / top = 0; mid = pool- > size / 2; bot = pool- > size-1; index =-1 While (index = =-1) {/ * these 4 cases find a new location * / if (chromo- > worth data [top] .worth) index = top; else if (chromo- > worth = = pool- > data [mid] .worth) index = mid; else if (chromo- > worth = = pool- > data [bot] .worth) index = bot; else if (bot-top worth

< pool->

Data [mid] .worth) {bot = mid; mid = top + ((bot-top) / 2);} else {/ * (chromo- > worth > pool- > data [mid] .worth) * / top = mid; mid = top + ((bot-top) / 2) }} / *. While * / / * now we have index for chromo * / * move every gene from index on down one position to make room for chromo * / / * * copy new gene into pool storage; always replace worst gene in pool * / geqo_copy (root, & pool- > data [pool-> size-1], chromo, pool- > string_length); swap_chromo.string = pool- > data [pool-> size-1] .string Swap_chromo.worth = pool- > data [pool-> size-1] .worth; for (I = index; I

< pool->

Size; iTunes +) {tmp_chromo.string = pool- > data [I] .string; tmp_chromo.worth = pool- > data [I] .worth; pool- > data [I] .string = swap_chromo.string; pool- > data [I] .worth = swap_chromo.worth; swap_chromo.string = tmp_chromo.string; swap_chromo.worth = tmp_chromo.worth }} / * * init_tour * * Randomly generates a legal "traveling salesman" tour * (i.e. Where each point is visited only once.) * randomly generate TSP paths (visit each point only once) * / void init_tour (PlannerInfo * root, Gene * tour, int num_gene) {int I, j / * We must fill the tour [] array with a random permutation of the numbers * 1.. Num_gene. We can do that in one pass using the "inside-out" * variant of the Fisher-Yates shuffle algorithm. Notionally, we append * each new value to the array and then swap it with a randomly-chosen * array element (possibly including itself, else we fail to generate * permutations with the last city last). The swap step can be optimized * by combining it with the insertion. * / if (num_gene > 0) tour [0] = (Gene) 1; for (I = 1; I

< num_gene; i++) { j = geqo_randint(root, i, 0); /* i != j check avoids fetching uninitialized array element */ if (i != j) tour[i] = tour[j]; tour[j] = (Gene) (i + 1); } } //----------------------------------------------------------- geqo_eval /* * geqo_eval * * Returns cost of a query tree as an individual of the population. * 返回该此进化的适应度。 * * If no legal join order can be extracted from the proposed tour, * returns DBL_MAX. * 如无合适的连接顺序,返回DBL_MAX */ Cost geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) { MemoryContext mycontext; MemoryContext oldcxt; RelOptInfo *joinrel; Cost fitness; int savelength; struct HTAB *savehash; /* * Create a private memory context that will hold all temp storage * allocated inside gimme_tree(). * * Since geqo_eval() will be called many times, we can't afford to let all * that memory go unreclaimed until end of statement. Note we make the * temp context a child of the planner's normal context, so that it will * be freed even if we abort via ereport(ERROR). */ mycontext = AllocSetContextCreate(CurrentMemoryContext, "GEQO", ALLOCSET_DEFAULT_SIZES); oldcxt = MemoryContextSwitchTo(mycontext); /* * gimme_tree will add entries to root->

Join_rel_list, which may or may * not already contain some entries. The newly added entries will be * recycled by the MemoryContextDelete below, so we must ensure that the * list is restored to its former state before exiting. We can do this by * truncating the list to its original length. NOTE this assumes that any * added entries are appended at the end! * * We also must take care not to mess up the outer join_rel_hash, if there * is one. We can do this by just temporarily setting the link to NULL. * (If we are dealing with enough join rels, which we very likely are, a * new hash table will get built and used locally.) * * join_rel_level [] shouldn't be in use, so just Assert it isn't. * / savelength = list_length (root- > join_rel_list); savehash = root- > join_rel_hash; Assert (root- > join_rel_level = = NULL); root- > join_rel_hash = NULL / * construct the best path for the given combination of relations * / / given relationship combination, construct the best access path joinrel = gimme_tree (root, tour, num_gene) / * * compute fitness, if we found a valid join * if a valid connection is found, calculate its fitness * * XXX geqo does not currently support optimization for partial result * retrieval, nor do we take any cognizance of possible use of * parameterized paths-how to fix? * genetic algorithm currently does not support the optimization of partial result retrieval. It is not known whether it is possible to use parameterized paths-how to fix them? * / if (joinrel) {Path * best_path = joinrel- > cheapest_total_path / / get the optimal path of the generated relationship fitness = best_path- > total_cost;// fitness = the total cost of the path} else fitness = DBL_MAX;// connection is invalid and the fitness is DBL_MAX. / * Restore join_rel_list to its former state and and put back original * hashtable if any will be discarded in the next iteration. * restore join_rel_list to its original state. If there is a hash table, put the original hash table back. * / root- > join_rel_list = list_truncate (root- > join_rel_list, savelength); root- > join_rel_hash = savehash; / * release all the memory acquired within gimme_tree * / / release resource MemoryContextSwitchTo (oldcxt); MemoryContextDelete (mycontext); return fitness } / /-gimme_tree/* * gimme_tree * Form planner estimates for a join tree constructed in the specified * order. * the connection tree is constructed in a given order, and the cost is estimated by the optimizer. * * 'tour' is the proposed join order, of length' num_gene' * tour- recommended connection sequence with a length of num_gene * * Returns a new join relation whose cheapest path is the best plan for * this join order. NB: will return NULL if join order is invalid and * we can't modify it into a valid order. * return a new connection, and the path with the lowest cost is the best plan for this connection sequence. * if join order is not valid and cannot be modified to a valid order, NULL is returned. * * The original implementation of this routine always joined in the specified * order, and so could only build left-sided plans (and right-sided and * mixtures, as a byproduct of the fact that make_join_rel () is symmetric). * It could never produce a "bushy" plan. This had a couple of big problems, * of which the worst was that there are situations involving join order * restrictions where the only valid plans are bushy. * the initial implementation of this process is always connected in the specified order, so you can only build the left plan * (and the right and mixed plans, which is a by-product of the fact that make_join_rel () is symmetrical). * it will never produce a "bushy" (nasty M, where N ≥ 2 and M ≥ 2) plan. * there are several big problems, the worst of which involves connection order restrictions, of which the only valid plan is bushy. * * The present implementation takes the given tour as a guideline, but * postpones joins that are illegal or seem unsuitable according to some * heuristic rules. This allows correct bushy plans to be generated at need, * and as a nice side-effect it seems to materially improve the quality of the * generated plans. Note however that since it's just a heuristic, it can * still fail in some cases. (In particular, we might clump together * relations that actually mustn't be joined yet due to LATERAL restrictions; * since there's no provision for un-clumping, this must lead to failure.) * the current implementation is guided by a given route (tour), but according to some heuristic rules, the addition of illegal or seemingly inappropriate genes is delayed. * this allows the right bushy plan to be generated when needed, which brings additional benefits and seems to substantially improve the quality of the generation plan. * note, however, that because it is only a heuristic, it may still fail in some cases. * (in particular, we may combine relationships that cannot actually be connected due to horizontal limitations; in the absence of non-LATERAL rules, this will certainly lead to failure.) * / RelOptInfo * gimme_tree (PlannerInfo * root, Gene * tour, int num_gene) {GeqoPrivateData * private = (GeqoPrivateData *) root- > join_search_private; List * clumps; int rel_count; / * * Sometimes, a relation can't yet be joined to others due to heuristics * or actual semantic restrictions. We maintain a list of "clumps" of * successfully joined relations, with larger clumps at the front. Each * new relation from the tour is added to the first clump it can be joined * to; if there is none then it becomes a new clump of its own. When we * enlarge an existing clump we check to see if it can now be merged with * any other clumps. After the tour is all scanned, we forget about the * heuristics and try to forcibly join any remaining clumps. If we are * unable to merge all the clumps into one, fail. * sometimes relationships cannot be connected to other relationships due to heuristic or actual semantic limitations. * as a result, a "clumps" linked list of successful joins is retained, and there was a larger clumps (clustering) before. * each new relationship is added from tour to the first clump (clustering), and it can be added; if not, it will form a clump (clustering). * when expanding the existing clump (clustering), you need to check whether it can now be merged with other clumps (clustering). * after all tour gene scans, do not use heuristic rules and try to force into any remaining clumps (clustering). * if we cannot combine all the clusters into one population, we will fail. * / clumps = NIL; for (rel_count = 0; rel_count)

< num_gene; rel_count++)//遍历基因即参与连接的关系 { int cur_rel_index;//当前索引 RelOptInfo *cur_rel;//当前的关系 Clump *cur_clump;//当前的clump聚类 /* 获取下一个输入的关系.Get the next input relation */ cur_rel_index = (int) tour[rel_count]; cur_rel = (RelOptInfo *) list_nth(private->

Initial_rels, cur_rel_index-1); / * in a separate cluster clump; Make it into a single-rel clump * / cur_clump = (Clump *) palloc (sizeof (Clump)); cur_clump- > joinrel = cur_rel; cur_clump- > size = 1 / * Merge it into the clumps list, using only desirable joins * / / merge it into the clump linked list using the desired join method (force=F) clumps = merge_clump (root, clumps, cur_clump, num_gene, false) } if (list_length (clumps) > 1) / / Cluster list > 1 {/ * Force-join the remaining clumps in some legal order * / / add the remaining clusters in the traditional order: List * fclumps;// linked list ListCell * lc;// element fclumps = NIL Foreach (lc, clumps) {Clump * clump = (Clump *) lfirst (lc); / / (force=T) fclumps = merge_clump (root, fclumps, clump, num_gene, true);} clumps = fclumps } / * Did we succeed in forming a single join relation? * / if (list_length (clumps)! = 1) / / cannot form the final result relationship. Return NULL return NULL; return ((Clump *) linitial (clumps))-> joinrel / / if successful, the result Relation} / /-- merge_clump/* * Merge a "clump" into the list of existing clumps for gimme_tree is returned. * merge a clump cluster into the existing clumps cluster generated in gimme_tree * * We try to merge the clump into some existing clump, and repeat if * successful. When no more merging is possible, insert the clump * into the list, preserving the list ordering rule (namely, that * clumps of larger size appear earlier). * attempt to merge clump into an existing clumps, and repeat if successful. * when a merge is no longer possible, insert clump into the linked list, leaving the list collation (that is, a larger clump appears first). * * If force is true, merge anywhere a join is legal, even if it causes * a cartesian join to be performed. When force is false, do only * "desirable" joins. * if force is true, merge occurs where the connection is legal, even if this results in a Cartesian connection. When the force force is F, only the "appropriate" connection is made. * / static List * merge_clump (PlannerInfo * root,// optimization information List * clumps, / / clustering linked list Clump * new_clump, / / new clustering int num_gene,// gene format bool force) / / whether to force the addition of {ListCell * prev; ListCell * lc; / * Look for a clump that new_clump can join to * / / verify whether the new cluster can be added prev = NULL Foreach (lc, clumps) / / traversal linked list {Clump * old_clump = (Clump *) lfirst (lc); / / original clustering if (force | | desirable_join (root, old_clump- > joinrel, new_clump- > joinrel)) / / if you are forced to join or you can add {RelOptInfo * joinrel as required / / * Construct a RelOptInfo representing the join of these two input * relations. Note that we expect the joinrel not to exist in * root- > join_rel_list yet, and so the paths constructed for it * will only include the ones we want. * construct a RelOptInfo to represent the connection of the two input relationships. * Note that joinrel is not expected to exist in root- > join_rel_list, so the path constructed for it will only contain the path we expect. * / joinrel = make_join_rel (root, old_clump- > joinrel, new_clump- > joinrel); / / construct a new connection relationship RelOptInfo / * if the connection order is invalid, continue to search Keep searching if join order is not valid * / if (joinrel) {/ * Create paths for partitionwise joins. * / / create partitionwise connection generate_partitionwise_join_paths (root, joinrel); / * * Except for the topmost scan/join rel, consider gathering * partial paths. We'll do the same for the topmost scan/join * rel once we know the final targetlist (see * grouping_planner). * in addition to the top scan / connection relationship, try the gather partial (parallel) access path. * once we know the final targetlist (see grouping_planner), we will do the same for the top-level scan / connect relationship. * / if (old_clump- > size + new_clump- > size

< num_gene) generate_gather_paths(root, joinrel, false); /* Find and save the cheapest paths for this joinrel */ //设置成本最低的路径 set_cheapest(joinrel); /* Absorb new clump into old */ //把新的clump吸纳到旧的clump中,释放new_clump old_clump->

< number_generations; generation++)(gdb) p number_generations$52 = 250 利用线性偏差(bias)函数选择,然后交叉遗传 181 geqo_selection(root, momma, daddy, pool, Geqo_selection_bias);(gdb) n185 gimme_edge_table(root, momma->

String, daddy- > string, pool- > string_length, edge_table); (gdb) n187 kid = momma; (gdb) p * momma$1 = {string = 0x1f30460, worth = 637.36587618977478} (gdb) p * momma- > string$2 = 11 (gdb) p * daddy- > string$3 = 8 (gdb) p * daddy$4 = {string = 0x1f304e0, worth = 635.57048404744364}

Traverse the boundary table, calculate the cost of kid, put kid into the population, and further evolve

(gdb) 216kid- > worth = geqo_eval (root, kid- > string, pool- > string_length); (gdb) p * kid$5 = {string = 0x1f30460, worth = 637.36587618977478} (gdb) n219 spread_chromo (root, kid, pool); (gdb) p * kid$6 = {string = 0x1f30460, worth = 663.22435850797251} (gdb) n178 for (generation = 0; generation)

< number_generations; generation++) 下面考察进化过程中的geqo_eval函数,进入该函数,13个基因,tour为2 (gdb) 216 kid->

Worth = geqo_eval (root, kid- > string, pool- > string_length); (gdb) stepgeqo_eval (root=0x1fbf0f8, tour=0x1f30460, num_gene=13) at geqo_eval.c:7575 mycontext = AllocSetContextCreate (CurrentMemoryContext, (gdb) p * tour$7 = 2

Assign / save state

(gdb) n78 oldcxt = MemoryContextSwitchTo (mycontext); (gdb) 95 savelength = list_length (root- > join_rel_list); (gdb) 96 savehash = root- > join_rel_hash; (gdb) 97 Assert (root- > join_rel_level = = NULL); (gdb) 99 root- > join_rel_hash = NULL

Go to the geqo_eval- > gimme_tree function

(gdb) 102 joinrel = gimme_tree (root, tour, num_gene); (gdb) stepgimme_tree (root=0x1fbf0f8, tour=0x1f30460, num_gene=13) at geqo_eval.c:165165 GeqoPrivateData * private = (GeqoPrivateData *) root- > join_search_private

The linked list clumps is initialized to NULL, starts the loop, performs the join operation, and the tour array saves the order of the RTE.

(gdb) n180 clumps = NIL; (gdb) 182 for (rel_count = 0; rel_count

< num_gene; rel_count++)(gdb) n189 cur_rel_index = (int) tour[rel_count];(gdb) p tour[0]$9 = 2(gdb) p tour[1]$10 = 12 循环添加到clumps中,直至所有的表都加入到clumps中或者无法产生有效的连接 (gdb) n190 cur_rel = (RelOptInfo *) list_nth(private->

Initial_rels, (gdb) 194 cur_clump = (Clump *) palloc (sizeof (Clump)); (gdb) 195cur_clump- > joinrel = cur_rel; (gdb) 196cur_clump- > size = 1; (gdb) 199clumps = merge_clump (root, clumps, cur_clump, num_gene, false); (gdb) (gdb) 182for (rel_count = 0; rel_count)

< num_gene; rel_count++) 完成循环调用 (gdb) b geqo_eval.c:218Breakpoint 2 at 0x793bf9: file geqo_eval.c, line 218.(gdb) cContinuing.Breakpoint 2, gimme_tree (root=0x1fbf0f8, tour=0x1f30460, num_gene=13) at geqo_eval.c:219219 if (list_length(clumps) != 1) clumps链表中只有一个元素,该元素为13张表成功连接的访问路径 (gdb) p *clumps$11 = {type = T_List, length = 1, head = 0x1ea2e20, tail = 0x1ea2e20}$12 = {joinrel = 0x1ee4ef8, size = 13}(gdb) p *((Clump *)clumps->

Head- > data.ptr_value)-> joinrel$13 = {type = T_RelOptInfo, reloptkind = RELOPT_JOINREL, relids = 0x1ee34b0, rows = 100, consider_startup = false, consider_param_startup = false, consider_parallel = true, reltarget = 0x1ee5110, pathlist = 0x1ee5a78, ppilist = 0x0, partial_pathlist = 0x0, cheapest_startup_path = 0x1ee5ee0, cheapest_total_path = 0x1ee5ee0, cheapest_unique_path = 0x0, cheapest_parameterized_paths = 0x1ee5fa0, direct_lateral_relids = 0x0, lateral_relids = 0x0, 0x0 = 0, relid = 0 Rtekind = RTE_JOIN, min_attr = 0, max_attr = 0, attr_needed = 0x0, attr_widths = 0x0, lateral_vars = 0x0, lateral_referencers = 0x0, indexlist = 0x0, statlist = 0x0, pages = 0, tuples = 0, allvisfrac = 0, subroot = 0x0, subplan_params = 0x0, rel_parallel_workers =-1, serverid = 0, userid = 0, useridiscurrent = false, fdwroutine = 0x0, fdw_private = 0x0, unique_for_rels = 0x0, non_unique_for_rels = 0x0, 0x0 = 0x0 Baserestrictcost = {startup = 0, per_tuple = 0}, baserestrict_min_security = 4294967295, joininfo = 0x0, has_eclass_joins = false, consider_partitionwise_join = false, top_parent_relids = 0x0, part_scheme = 0x0, nparts = 0, boundinfo = 0x0, partition_qual = 0x0, part_rels = 0x0, partexprs = 0x0, nullable_partexprs = 0x0, partitioned_child_rels = 0x0}

The geqo_eval- > gimme_tree function returns

(gdb) n222 return ((Clump *) linitial (clumps))-> joinrel

Go back to the geqo_eval function, set the fitness, restore the scene, etc.

(gdb) 113 Path * best_path = joinrel- > cheapest_total_path; (gdb) n115 fitness = best_path- > total_cost; (gdb) 124 root- > join_rel_list = list_truncate (root- > join_rel_list, (gdb) 126 root- > join_rel_hash = savehash; (gdb) 129 MemoryContextSwitchTo (oldcxt); (gdb) 130MemoryContextDelete (mycontext); (gdb) 132 return fitness; (gdb) p fitness$14 = 680.71399161308523

Go back to the function geqo and continue iterating

Geqo (root=0x1fbf0f8, number_of_rels=13, initial_rels=0x1f0f698) at geqo_main.c:219219 spread_chromo (root, kid, pool); (gdb) 178for (generation = 0; generation)

< number_generations; generation++) 完成循环迭代 (gdb) b geqo_main.c:229Breakpoint 3 at 0x79407a: file geqo_main.c, line 229.(gdb) cContinuing.Breakpoint 3, geqo (root=0x1fbf0f8, number_of_rels=13, initial_rels=0x1f0f698) at geqo_main.c:260260 best_tour = (Gene *) pool->

Data [0] .string

The best access node path (stored in the best_tour array)

(gdb) p best_tour [0] $17 = 2 (gdb) p best_tour [1] $18 = 7 (gdb) p best_tour [12] $19 = 3 (gdb) p best_tour [13]

The best final result relationship

(gdb) p * best_rel$21 = {type = T_RelOptInfo, reloptkind = RELOPT_JOINREL, relids = 0x1f3d098, rows = 1, consider_startup = false, consider_param_startup = false, consider_parallel = true, reltarget = 0x1f3d7e0, pathlist = 0x1f3e148, ppilist = 0x0, partial_pathlist = 0x0, cheapest_startup_path = 0x1f3e550, cheapest_total_path = 0x1f3e550, cheapest_unique_path = 0x0, cheapest_parameterized_paths = 0x1f3e670, direct_lateral_relids = 0x0, lateral_relids = 0x0, relid = 0, reltablespace = 0, reltablespace = reltablespace Min_attr = 0, max_attr = 0, attr_needed = 0x0, attr_widths = 0x0, lateral_vars = 0x0, lateral_referencers = 0x0, indexlist = 0x0, statlist = 0x0, pages = 0, tuples = 0, allvisfrac = 0, subroot = 0x0, subplan_params = 0x0, rel_parallel_workers =-1, serverid = 0, userid = 0, useridiscurrent = false, fdwroutine = 0x0, fdw_private = 0x0, unique_for_rels = 0x0, non_unique_for_rels = 0x0, baserestrictinfo = 0x0, 0x0 = {baserestrictcost = 0 Per_tuple = 0}, baserestrict_min_security = 4294967295, joininfo = 0x0, has_eclass_joins = false, consider_partitionwise_join = false, top_parent_relids = 0x0, part_scheme = 0x0, nparts = 0, boundinfo = 0x0, partition_qual = 0x0, part_rels = 0x0, partexprs = 0x0, nullable_partexprs = 0x0, partitioned_child_rels = 0x0}

Clean up the scene and return

(gdb) n274 free_chromo (root, daddy); (gdb) 277 free_edge_table (root, edge_table); (gdb) 294 free_pool (root, pool); (gdb) 297 root- > join_search_private = NULL; (gdb) 299 return best_rel; (gdb) 300}

DONE!

IV. Reference materials

Allpaths.c

Cost.h

Costsize.c

PG Document:Query Planning

Ten minutes to understand the genetic algorithm

The distance between you and the genetic algorithm may only be this article.

Intelligent Optimization method and its Application-genetic algorithm

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report