How to develop a Python interpreter with Python 03/31 Update SLTechnology News&Howtos

How to develop a Python interpreter with Python

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article is about how to use Python to develop a Python interpreter, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Foreword:

Computers can only understand machine code. In the final analysis, a programming language is just a string of words designed to make it easier for humans to write what they want computers to do. The real magic is done by the compiler and interpreter, which bridge the gap between the two. The interpreter reads the substitution code line by line and converts it into machine code.

In this article, we will design an interpreter that can perform arithmetic operations.

We're not going to rebuild the wheel. This article will use PLY (Python Lex-Yacc (https://github.com/dabeaz/ply)), a lexical parser developed by David M. Beazley.

PLY can be downloaded in the following ways:

$pip install ply

We'll take a quick look at the basics needed to create an interpreter. For more information, see this GitHub repository (https://github.com/dabeaz/ply)).

1. Tag (Token)

Tags are the smallest character units that provide meaningful information to the interpreter. The tag contains a pair of names and attribute values.

Let's start by creating a list of tag names. This is a necessary step.

Tokens = (# data types "NUM", "FLOAT", # arithmetic operations "PLUS", "MINUS", "MUL", "DIV", # parentheses "LPAREN", "RPAREN",) 2. Lexical analyzer (Lexer)

The process of converting a statement to a tag is called tokenization or lexical analysis. The program that performs lexical analysis is the lexical analyzer.

# regular expression of the tag t_PLUS = r "\ +" t_MINUS = r "\ -" t_MUL = r "\ *" t_DIV = r "/" t_LPAREN = r "\ (" t_RPAREN = r "\)" t_POW = r "\ ^" # ignore spaces and tabs t_ignore = "\ t" # add action def t_FLOAT for each rule (t): r "\ d +\.\ d +"t.value = float (t.value) return t def t_NUM (t): r"\ d +"t.value = int (t.value) return t # error handling of undefined regular characters def t_error (t): # where the t.value contains the rest of the unmarked input Enter print (f "keyword not found: {t.value [0]}\ nline {t.lineno}") t.lexer.skip (1) # if encountered, set it to a new line def t_newline (t): r ""\ n + "" t.lexer.lineno + = t.value.count ("\ n")

To import the lexical analyzer, we will use:

Import ply.lex as lex

T _ is a special prefix that indicates the rules that define the tag. Each lexical rule is made with regular expressions and is compatible with the re module in Python. Regular expressions can scan the input according to the rules and search for matching symbol strings. The grammar defined by a regular expression is called a regular grammar. The language defined by regular grammar is called regular language.

With the rules defined, we will build a lexical analyzer:

Data ='a = 2 + (10-8) / 1.0 'lexlexer = lex.lex () lexer.input (data) while tok: = lexer.token (): print (tok)

To pass the input string, we use lexer.input (data). Lexer.token () returns the next LexToken instance and finally None. According to the above rules, the tag for code 2 + (10-8) / 1.0 will be:

The purple character represents the name of the tag, followed by the specific content of the tag.

3. Bakos-Noel normal form (Backus-Naur Form,BNF)

Most programming languages can be written in context-free grammar. It is more complex than conventional language. For context-free grammars, we use context-free grammars, which are rule sets that describe all possible grammars in the language. BNF is a way to define syntax, which describes the syntax of a programming language.

Let's look at examples:

Symbol: alternative1 | alternative2...

According to the production, the left side of: is replaced by one of the values on the right. The values on the right are separated by | (it can be understood that symbol is defined as alternative1 or alternative2 or... Wait a minute).

For our arithmetic interpreter, the syntax specifications are as follows:

The tokens entered are symbols such as NUM, FLOAT, +, -, *, /, called terminals (characters that cannot continue to decompose or produce other symbols). An expression consists of a terminal and a rule set. For example, expression is called a non-terminal.

4. Parser (Parser)

We will use YACC (Yet Another Compiler Compiler) as the parser generator. Import module: import ply.yacc as yacc.

From operator import (add, sub, mul, truediv, pow) # list of operators supported by our interpreter ops = {"+": add, "-": sub, "*": mul, "/": truediv, "^": pow } def p__expression (p): "" expression: expression PLUS expression | expression MINUS expression | expression DIV expression | expression MUL expression | expression POW expression "if (p [2], p [3]) = = (" / ", 0): # if divided by 0 Then take "INF" (infinite) as the value p [0] = float ("INF") else: P [0] = ops [p [2]] (p [1]) P [3]) def p_expression_uplus_or_expr (p): "expression: PLUS expression% prec UPLUS | LPAREN expression RPAREN" p [0] = p [2] def p_expression_uminus (p): "expression: MINUS expression% prec UMINUS"p [0] =-p [2] def p_expression_num (p):" expression: NUM | FLOAT "" p [0] = p [1] # Rule def p_error (p): print (f "Syntax error in {p.value}")

In the document string, we will add the appropriate syntax specification. The elements in the p list correspond to the syntax symbols one by one, as follows:

Expression: expression PLUS expression p [0] p [1] p [2] p [3]

In the above,% prec UPLUS and% prec UMINUS are used to represent custom operations. % prec is the abbreviation of precedence. There is no such thing as UPLUS and UMINUS in symbols (in this article, these two custom operations represent unary plus signs and symbols, but UPLUS and UMINUS are just names, and you can take whatever you want). After that, we can add expression-based rules. YACC allows priority to be assigned to each token.

We can set it up in the following ways:

Precedence = ("left", "PLUS", "MINUS"), ("left", "MUL", "DIV"), ("left", "POW"), ("right", "UPLUS", "UMINUS"))

In the priority declaration, the tags are arranged in order of priority from lowest to highest. PLUS and MINUS have the same priority and are left associative (operations are performed from left to right). The priority of MUL and DIV is higher than that of PLUS and MINUS, and they also have left association. The same is true of POW, but higher priority. UPLUS and UMINUS are right associative (the operation is performed from right to left).

To parse the input, we will use:

Parser = yacc.yacc () result = parser.parse (data) print (result)

The complete code is as follows:

# introduction module # from logging import (basicConfig, INFO, getLogger) from operator import (add, sub, mul, truediv) Pow) import ply.lex as lex import ply.yacc as yacc # list of operators supported by our interpreter ops = {"+": add, "-": sub, "*": mul, "/": truediv, "^": pow } # tag set # tokens = (# data type "NUM", "FLOAT" # arithmetic operations "PLUS", "MINUS", "MUL", "DIV", "POW", # parentheses "LPAREN", "RPAREN" ) # the regular expression of the tag # t_PLUS = r "\ +" t_MINUS = r "\- "t_MUL = r"\ * "t_DIV = r" / "t_LPAREN = r"\ ("t_RPAREN = r"\) "t_POW = r"\ ^ "# ignore spaces and tabs t_ignore ="\ t "# add the action def t_FLOAT (t): r"\ d +\.\ d +" t.value = float ( T.value) return t def t_NUM (t): r "\ d +"t.value = int (t.value) return t # error handling of undefined regular characters def t_error (t): # the t.value here contains the rest of the unmarked input print (f" keyword not found: {t.value [0]}\ nline {t.lineno} " ) t.lexer.skip (1) # if you see\ n, set it to a new line def t_newline (t): r ""\ n + "" t.lexer.lineno + = t.value.count ("\ n") # set the symbol priority # precedence = ("left" "PLUS", "MINUS"), ("left", "MUL", "DIV"), ("left", "POW"), ("right", "UPLUS" ("UMINUS")) # write BNF rules # def p__expression (p): "" expression : expression PLUS expression | expression MINUS expression | expression DIV expression | expression MUL expression | expression POW expression "" if (p [2]) P [3]) = ("/", 0): # if divided by 0 Then take "INF" (infinite) as the value p [0] = float ("INF") else: P [0] = ops [p [2]] (p [1]) P [3]) def p_expression_uplus_or_expr (p): "expression: PLUS expression% prec UPLUS | LPAREN expression RPAREN" p [0] = p [2] def p_expression_uminus (p): "expression: MINUS expression% prec UMINUS"p [0] =-p [2] def p_expression_num (p):" expression: NUM | FLOAT "" p [0] = p [1] # rules def p_error (p): print (f "Syntax error in {p.value}") # main program # if _ _ name__ = "_ _ main__": basicConfig (level=INFO Filename= "logs.txt") lexlexer = lex.lex () parser = yacc.yacc () while True: try: result = parser.parse (input ("> >"), debug=getLogger () print (result) except AttributeError: print ("invalid syntax") above is how to use Python to develop a Python interpreter The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.