Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

PostgreSQL Source Code interpretation-query # 94 (Syntax Analysis: gram.y) # 3

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This section continues to introduce Productions (production), the third part of PostgreSQL's parsing definition file, gram.y.

The composition of the Bison input file:

% {Declarations%} Definitions%%Productions%%User subroutines one, Productions

Productions is production, which is the syntax production written by the user. The production format is as follows:

S-> X\ nX-> X + X | X-X | T_NUMBER

S-> X\ nbecomes production, and the leftmost symbol of the first production becomes the starting symbol, in this case the symbol S.

In order to avoid recursive parsing, Bison will add an extra production S'-> Smeme S' as the starting symbol.

In Bison, the symbol ":" indicates a "- >", and different productions of the same non-Terminator are separated by ";". Within the curly braces at the end of each production is a piece of C code, which is executed when the production is applied and becomes Action (action). When the right side of the production is ε (empty set), use the comment / * empty * / instead.

Non-Terminators in production do not need to be predefined. Bison automatically determines which symbols are non-Terminators based on the left symbols of all production. In the Terminator, the single character token (the token type value is the same as the ASCII code of the character) does not need to be pre-defined and can be enclosed in single quotation marks within the production. Other types of token need to be pre-defined in the Definitions section, such as% token ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER. Bison will automatically assign a number to this token, and then write it to the gram.h file. Open the file, and you can see the following code:

[root@localhost src] # vim. / include/parser/gram.h.../* Token type. * / 44 # ifndef YYTOKENTYPE 45 # define YYTOKENTYPE 46 enum yytokentype 47 {48 IDENT = 258,49 FCONST = 259,50 SCONST = 260,51 BCONST = 261,52 XCONST = 262,53 Op = 263,54 ICONST = 264,55 PARAM = 265,....

The number starts at 258 and is defined one by one according to the order in gram.y.

...% token IDENT FCONST SCONST BCONST XCONST Op%token ICONST PARAM%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS%token ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC ASSERTION ASSIGNMENT ASYMMETRIC AT ATTACH ATTRIBUTE AUTHORIZATION...

These token definitions can be used directly in scan.l.

# include "parser/gramparse.h"-- > # include "parser/gram.h"

Bison will be converted to LALR (1) action table and output to gram.c file according to production and symbol priority. In the gram.c file, PG generates a function int base_yyparse (core_yyscan_t yyscanner) from the custom syntax file This function parses the token stream obtained by lexical analysis according to the LR (1) parsing flow. Whenever it needs to read the next symbol, it executes s = yylex (). Whenever it wants to perform a reduce action, the production C code applied by the reduce will be executed, and the corresponding state will be off the stack only after execution.

Here is part of the code for yyparse in gram.c:

/ * -. | yyparse. | `- * / intyyparse (core_yyscan_t yyscanner) {/ * The lookahead symbol. * / int yychar;/* The semantic value of the lookahead symbol. * Default value used for initialization, for pacifying older GCCs or non-GCC compilers. * / YY_INITIAL_VALUE (static YYSTYPE yyval_default;) YYSTYPE yylval YY_INITIAL_VALUE (= yyval_default) / * Location data for the lookahead symbol. * / static YYLTYPE yyloc_default# if defined YYLTYPE_IS_TRIVIAL & & YYLTYPE_IS_TRIVIAL = {1,1,1,1} # endif;YYLTYPE yylloc = yyloc_default; / * Number of syntax errors so far. * / int yynerrs; int yystate; / * Number of tokens to shift before error messages enabled. * / int yyerrstatus; / * The stacks and their tools: 'yyss': related to states. 'yyvs': related to semantic values. 'yyls': related to locations. Refer to the stacks through separate pointers, to allow yyoverflow to reallocate them elsewhere. * / * The state stack * / yytype_int16 yyssa [YYINITDEPTH]; yytype_int16 * yyss; yytype_int16 * yyssp; / * The semantic value stack. * / YYSTYPE yyvsa [YYINITDEPTH]; YYSTYPE * yyvs; YYSTYPE * yyvsp; / * The location stack. * / YYLTYPE yylsa [YYINITDEPTH]; YYLTYPE * yyls; YYLTYPE * yylsp; / * The locations where the error started and ended. * / YYLTYPE yyerror_range [3]; YYSIZE_T yystacksize; int yyn; int yyresult; / * Lookahead token as an internal (translated) token number. * / int yytoken = 0; / * The variables used to return semantic value and location from the action routines. * / YYSTYPE yyval; YYLTYPE yyloc;#if YYERROR_VERBOSE / * Buffer for error messages, and its allocated size. * / char yymsgbuf; char * yymsg = yymsgbuf; YYSIZE_T yymsg_alloc = sizeof yymsgbuf;#endif#define YYPOPSTACK (N) (yyvsp-= (N), yyssp-= (N), yylsp-= (N)) / * The number of symbols on the RHS of the reduced rule. Keep to zero when no symbol should be popped. * / int yylen = 0; yyssp = yyss = yyssa; yyvsp = yyvs = yyvsa; yylsp = yyls = yylsa; yystacksize = YYINITDEPTH;... II. Source code

Here is some of the source code for the gram.y production definition

/ * The target production for the whole parse. * / stmtblock: stmtmulti {pg_yyget_extra (yyscanner)-> parsetree = $1;}; / * At top level, we wrap each stmt with a RawStmt node carrying start location * and length of the stmt's text. Notice that the start loc/len are driven * entirely from semicolon locations (@ 2). It would seem natural to use * @ 1 or @ 3 to get the true start location of a stmt, but that doesn't work * for statements that can start with empty nonterminals (opt_with_clause is * the main offender here); as noted in the comments for YYLLOC_DEFAULT, * we'd get-1 for the location in such cases. * We also take care to discard empty statements entirely. * / stmtmulti: stmtmulti'; 'stmt {if ($1! = NIL) {/ * update length of previous stmt * / updateRawStmtEnd (llast_node (RawStmt, $1), @ 2) } if ($3! = NULL) $$= lappend ($1, makeRawStmt ($3, @ 2 + 1)); else $$= $1 | | stmt {if ($1! = NULL) $$= list_make1 (makeRawStmt ($1,0)); else $$= NIL;} | Stmt: AlterEventTrigStmt | AlterCollationStmt | AlterDatabaseStmt | AlterDatabaseSetStmt | AlterDefaultPrivilegesStmt | AlterDomainStmt | AlterEnumStmt | AlterExtensionStmt | AlterExtensionContentsStmt | AlterFdwStmt | AlterForeignServerStmt | AlterForeignTableStmt | AlterFunctionStmt | AlterGroupStmt | AlterObjectDependsStmt | AlterObjectSchemaStmt | AlterOwnerStmt | AlterOperatorStmt | AlterPolicyStmt | AlterSeqStmt | AlterSystemStmt | AlterTableStmt | AlterTblSpcStmt | AlterCompositeTypeStmt | AlterPublicationStmt | AlterRoleSetStmt | AlterRoleStmt | AlterSubscriptionStmt | AlterTSConfigurationStmt | AlterTSDictionaryStmt | AlterUserMappingStmt | AnalyzeStmt | CallStmt | CheckPointStmt | | ClosePortalStmt | ClusterStmt | CommentStmt | ConstraintsSetStmt | CopyStmt | CreateAmStmt | CreateAsStmt | CreateAssertStmt | CreateCastStmt | CreateConversionStmt | CreateDomainStmt | CreateExtensionStmt | CreateFdwStmt | CreateForeignServerStmt | CreateForeignTableStmt | CreateFunctionStmt | CreateGroupStmt | | | CreateMatViewStmt | CreateOpClassStmt | CreateOpFamilyStmt | CreatePublicationStmt | AlterOpFamilyStmt | CreatePolicyStmt | CreatePLangStmt | CreateSchemaStmt | CreateSeqStmt | CreateStmt | CreateSubscriptionStmt | CreateStatsStmt | CreateTableSpaceStmt | CreateTransformStmt | CreateTrigStmt | CreateEventTrigStmt | CreateRoleStmt | | | CreateUserStmt | CreateUserMappingStmt | CreatedbStmt | DeallocateStmt | DeclareCursorStmt | DefineStmt | DeleteStmt | DiscardStmt | DoStmt | DropAssertStmt | DropCastStmt | DropOpClassStmt | DropOpFamilyStmt | DropOwnedStmt | DropPLangStmt | DropStmt | DropSubscriptionStmt | | | DropTableSpaceStmt | DropTransformStmt | DropRoleStmt | DropUserMappingStmt | DropdbStmt | ExecuteStmt | ExplainStmt | FetchStmt | GrantStmt | GrantRoleStmt | ImportForeignSchemaStmt | IndexStmt | InsertStmt | ListenStmt | RefreshMatViewStmt | LoadStmt | LockStmt | | | NotifyStmt | PrepareStmt | ReassignOwnedStmt | ReindexStmt | RemoveAggrStmt | RemoveFuncStmt | RemoveOperStmt | RenameStmt | RevokeStmt | RevokeRoleStmt | RuleStmt | SecLabelStmt | SelectStmt | TransactionStmt | TruncateStmt | UnlistenStmt | UpdateStmt | | | VacuumStmt | VariableResetStmt | VariableSetStmt | VariableShowStmt | ViewStmt | / * EMPTY*/ {$$= NULL | } / * CALL statement * * * / CallStmt: CALL func_application {CallStmt * n = makeNode (CallStmt) N-> funccall = castNode (FuncCall, $2); $$= (Node *) n;};

The simple resolution is as follows:

1.stmtblock

Stmtblock: stmtmulti

Stmtblock is the starting symbol, and finally should be folded (reduce) to this symbol, otherwise there will be syntax errors.

The logic executed is: pg_yyget_extra (yyscanner)-> parsetree = $1

That is to say, the syntax parsing is completed and the syntax parsing tree parsetree.

2.stmtmulti

Tmtmulti: stmtmulti'; 'stmt

Left recursive production, PG can accept multiple statements separated by semicolons, each defined as stmt

3.stmt

Stmt: AlterEventTrigStmt | AlterCollationStmt... | SelectStmt.

Stmt includes more than N statements. Let's look at the most common SelectStmt statements.

4.SelectStmt

SelectStmt: select_no_parens% prec UMINUS | select_with_parens% prec UMINUS;... select_no_parens: simple_select {$$= $1 | | select_clause sort_clause {insertSelectOptions ((SelectStmt\ *) $1, $2, NIL, NULL, yyscanner); $= $1 | }. Simple_select: SELECT opt_all_clause opt_target_list into_clause from_clause where_clause group_clause having_clause window_clause {SelectStmt\ * n = makeNode (SelectStmt); n-> targetList = $3; n-> intoClause = $4 N-> fromClause = $5; n-> whereClause = $6; n-> groupClause = $7; n-> havingClause = $8; n-> windowClause = $9; $= (Node\ *) n;} | SELECT distinct_clause target_list... III. Reference materials

Flex&Bison

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report