How to realize Python Recursive descending Parser 07/16 Update SLTechnology News&Howtos

How to realize Python Recursive descending Parser

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the Python recursive decline Parser how to achieve the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that everyone after reading this Python recursive decline Parser how to achieve the article will have a harvest, let's take a look.

1. Evaluation of arithmetic operation expression

To parse this kind of text, you need another specific grammar rule. Here we introduce the grammatical rules Bakos normal form (BNF) and extended Bakos normal form (EBNF) that can represent context-free grammars (context free grammer). In fact, as small as an arithmetic expression to almost all programming languages, it is defined by context-free grammars.

For simple arithmetic expressions, it is assumed that we have converted it into an input tokens stream using word segmentation techniques, such as NUM+NUM*NUM (see the previous blog for word segmentation).

On this basis, we define BNF rules as follows:

Of course, this method is not simple and straightforward enough, we actually adopt the EBNF form:

Expr:: = term {(+ | -) term} * term:: = factor {(* | /) factor} * factor:: = (expr) | NUM

Every rule of BNF and EBNF (such as the formula of:: =) can be seen as a substitution, that is, the symbol on the left can be replaced by the symbol on the right. In the process of parsing, we try to match the input text with the rules of the language, and complete various substitutions and extensions through BNF/EBNF. Where the rules contained in {...} * in EBNF are optional, and * means zero or more duplicates (reference regular expressions).

The following figure vividly shows the relationship between the "recursive" and "descending" parts of the recursive descent parser (parser) and ENBF:

In the actual parsing process, we scan the tokens stream from left to right, and process the token during the scan, resulting in a syntax error if it gets stuck. For rules, we transform each syntax rule into a function or method. For example, the ENBF rule above is transformed into the following method:

Class ExpressionEvaluator ():... Def expr (self):... Def term (self):... Def factor (self):...

In the process of calling a rule corresponding method, if we find that the next symbol needs to be matched by another rule, we will "descend" to another rule method (such as calling factor in calling term,term in expr), that is, the "descending" part of the recursive descent.

Sometimes methods that are already in execution are called (such as calling factor in term,term in expr and expr in factor, which is the equivalent of a rattlesnake), which is the "recursive" part of recursive descent.

For repetitive parts of the syntax (for example, expr:: = term {(+ | -) term} *), we use the while loop.

Let's take a look at the specific code implementation. First of all, there is the participle part, we refer to the code of the previous article which introduces the participle blog.

Import reimport collections# defines a pattern that matches token NUM = r'(? P\ d +)'#\ d to represent matching numbers + indicates any length PLUS = r'(? P\ +)'# attention escape MINUS = r'(? P -) 'TIMES = r' (? P\ *)'# attention escape DIVIDE = r'(? P /) 'LPAREN = r' (? P\)'# attention escape RPAREN = r'(? P\))'# pay attention to escape WS = r'(? P\ s +)'# Don't forget the space. + represents any length master_pat = re.compile ('| '.join ([NUM, PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN, WS]) # TokenizerToken = collections.namedtuple (' Token', ['type',' value']) def generate_tokens (text): scanner = master_pat.scanner (text) for m in iter (scanner.match, None): tok = Token (m.lastgroup) M.group () if tok.type! = 'WS': # filter the blank character yield tok

The following is the implementation of the expression evaluator:

Class ExpressionEvaluator (): recursively descending Parser implementation of each syntax rule corresponding to a method. Use the. _ accept () method to test and accept the currently processed token, mismatch does not report an error, and use the. _ except () method to test the currently processed token And throw a syntax error "def parse (self, text):" external calling API "self.tokens = generate_tokens (text) self.tok, self.next_tok = None, None # the last token that has been matched when there is a mismatch" The next token self._next () # to match goes to the next token return self.expr () # to start recursive def _ next (self): "go to the next token" self.tok, self.next_tok = self.next_tok, next (self.tokens, None) def _ accept (self, tok_type): "" if the next token matches tok_type Then go to the next token "" if self.next_tok and self.next_tok.type = = tok_type: self._next () return True else: return False def _ except (self, tok_type): "" check whether there is a match. If it doesn't match, throw an exception "" if not self._accept (tok_type): raise SyntaxError ("Excepted" + tok_type) # and then there are syntax rules Each grammar rule corresponds to a method def expr (self): "" corresponding rule: expression:: = term {('+'|'-') term} * "exprval = self.term () # take the first while self._accept (" PLUS ") or self._accept (" DIVIDE "): # if the next item is" + "or"-" Op = self.tok.type # take the next item That is, the right value of the operator right = self.term () if op = = "PLUS": exprval + = right elif op = = "MINUS": exprval-= right return exprval def term (self): "" corresponding rule: term:: = factor {('*'|'/') factor} * " "termval = self.factor () # take the first item while self._accept (" TIMES ") or self._accept (" DIVIDE "): # if the next item is" + "or"-"op = self.tok.type # take the next item That is, the right value of the operator right = self.factor () if op = = "TIMES": termval * = right elif op = = "DIVIDE": termval / = right return termval def factor (self): "" corresponding rule: factor:: = NUM | (expr) " "if self._accept (" NUM "): # Recursive exit return int (self.tok.value) elif self._accept (" LPAREN "): exprval = self.expr () # continue recursively to find the expression value self._except (" RPAREN ") # Don't forget to check for closing parentheses If not, an exception return exprval else: raise SyntaxError ("Expected NUMBER or LPAREN") is thrown

Let's enter the following expression to test:

E = ExpressionEvaluator () print (e.parse ("2")) print (e.parse ("2x 3")) print (e.parse ("2x 3")) print (e.parse ("2 + (3x 4) * 5")

The evaluation results are as follows:

two

five

fourteen

thirty-seven

If the text we enter does not conform to the grammatical rules:

Print (e.parse ("2 + (3 + * 4)"))

Then a SyntaxError exception is thrown: Expected NUMBER or LPAREN.

In summary, it can be seen that our expression evaluation algorithm works correctly.

two。 Generate expression tree

Above we get the result of the expression, but what if we want to analyze the structure of the expression and generate a simple expression parsing tree? Then we need to make some changes to the methods of the above classes:

Class ExpressionTreeBuilder (ExpressionEvaluator): def expr (self): "" corresponding rule: expression:: = term {('+'|'-') term} * "exprval = self.term () # take the first while self._accept (" PLUS ") or self._accept (" DIVIDE "): # if the next item is" + "or"-" Op = self.tok.type # take the next item That is, the operator right value right = self.term () if op = = "PLUS": exprval = ('+', exprval, right) elif op = = "MINUS": exprval-= ('-', exprval) Right) return exprval def term (self): "" corresponding rule: term:: = factor {('*'|'/') factor} * "termval = self.factor () # take the first while self._accept (" TIMES ") or self._accept (" DIVIDE "): # if the next item is" + "or"-" Op = self.tok.type # take the next item That is, the right value of the operator right = self.factor () if op = = "TIMES": termval = ('*', termval, right) elif op = = "DIVIDE": termval = ('/', termval Right) return termval def factor (self): "" corresponding rule: factor:: = NUM | (expr) "" if self._accept ("NUM"): # Recursive exit return int (self.tok.value) # string conversion shaping elif self._accept ("LPAREN"): exprval = self.expr () # continue recursively to find the expression value self._except ("RPAREN") # Don't forget to check if there are closing parentheses If not, an exception return exprval else: raise SyntaxError ("Expected NUMBER or LPAREN") is thrown

Enter the following expression to test it:

Print (e.parse ("2-3")) print (e.parse ("2-3")) print (e.parse ("2 + (3-4) * 5") print (e.parse ('2-3-4)

Here is the result of the generation:

('+', 2,3)

('+', 2, (3, 4))

('+', 2, ('*', (3, 4), 5))

('+', (2, 3), 4)

You can see that the expression tree is generated correctly.

Our example above is very simple, but recursive descending parsers can also be used to implement fairly complex parsers. For example, Python code is parsed through a recursive descending parser. If you are interested in this, you can check the Grammar file in the Python source code to find out. However, as we will see below, writing a parser yourself can face a variety of pitfalls and challenges.

Left recursion and operator precedence trap

Any grammar rules involving left recursive forms cannot be solved by recursive descending parser. The so-called left recursion means that the leftmost symbol on the right side of the rule formula is the rule header, for example, for the following rules:

Items:: = items', 'item | item

To complete this parsing, you may define the following methods:

Def items (self): itemsval = self.items () # take the first item, but here it will be infinitely recursive! If itemsval and self._accept (','): itemsval.append (self.item ()) else: itemsval = [self.item ()]

Doing so will infinitely call self.items () on the first line, resulting in an infinite recursive error.

There are also errors in the syntax rules themselves, such as operator precedence. If we ignore the operator precedence, we simply simplify the expression as follows:

Expr:: = factor {('+'|'-'|'*'|'/') factor} * factor:: ='('expr')'| NUM

PYTHON copy full screen

This syntax is technically possible, but does not comply with the calculation order convention, resulting in a result of "3 percent 4 percent 5" with an operation result of 35 instead of 23 as expected. Therefore, independent expr and term rules are needed to ensure the correctness of the calculation results.

This is the end of the article on "how to achieve Python recursive descent Parser". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to achieve Python recursive descent Parser". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.