Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the source code of thriftpy+ply

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to carry out thriftpy+ply source code analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

Thrift uses ply as the compiler and parser. Ply is a convenient source code for getting started with compiling principles. The amount of code is small, and the python text is the code, so it is easy to parse.

Ex calls every word swept out as token,token. There are many kinds of words. Compared with the natural language, every word in English is token,token there are many categories, such as non (noun) is a class token,apple is a specific token of this type. For a programming language, the number of token is very limited, unlike English, a natural language with hundreds of thousands of words.

The lex tool will help you generate a yylex function that yacc calls to know what type of token you get, but the type of token is defined in yacc.

The input file of lex is generally named .l file, and the output file we get through lex XX.l is lex.yy.c.

What is yacc?

Having just finished talking about lex, what about yacc, the work that yacc does in textbooks is called syntactic analysis. This time our translation did not literally do syntactic analysis, but called grammatical analysis, this translation can be a little better, the meaning is basically clearer.

In fact, when we first learn English, the teacher will tell us that English is actually "word + grammar". This view is appropriate in the programming language. Lex extracts words, so the rest is how to express grammar. So that's what yacc does (it should actually be said that BNF does it).

Yacc will help us generate a yyparse function, which will constantly call the above yylex function to get the type of token.

The input file of yacc is generally named as .y file, and the output file we get through yacc-d XX.y is y.tab.h y.tab.c. The former contains the token type definition required by lex and needs to be include into the .l file.

Input file formats for lex and yacc

Definition section

%%

Rules section

%%

C code section

The file formats of .l and .y are divided into three segments, divided by%%. The meaning of the three section is:

Definition Section

This block can put a variety of C language include,define and other declaration statements, but enclosed in% {%}.

If it is a .l file, you can put the predefined regular expression: minus "-" and also the definition of token, by code name regular expression. Then Rules Section can refer to the regular expression through {symbol}.

If it is a .y file, you can put the definition of token, such as% token INTEGER PLUS, where each token can be seen in y.tab.h

Rules section

The rules that the .l file places here is the action that each regular expression should correspond to, usually returning a token.

The rules placed in the .y file here is the action to be performed when it satisfies a syntax description.

Whether it's a .l file or a .y file, the actions here are expanded in {}, described in C, and the code can do whatever you want.

C code Section

Definition of main function, yyerror function, etc.

What can lex and yacc do for us?

Bottom line: explain and execute a custom language. There are several points to pay attention to:

What needs to be done in a custom language must be able to be done in C language. In fact, anything that a computer can do can be realized in C language. The meaning of lex and yacc is to simplify the language and enable users to implement complex operations in a relatively simple language. For example: there must be a ready-made library to complete the query for the database, but it is troublesome to use it. You have to write your own idiom to call API and compile it. If we want to customize a simple language (such as SQL) to implement the operation, we can use lex and yacc at this time.

What lex and yacc do is to implement another language in C language. Therefore, he can not implement the C language itself, but he can implement java, python and so on. Of course, you can parse and execute C language through Antlr. If you do so, C language programs are first executed through java, and then java becomes a native language (C language). Who says our operating system is implemented in C language?

What are we going to do with lex and yacc?

Define various token types. They define in .y that these token will be used by both lex and BNF in the .y file.

Write vocabulary analysis code. This part of the code is in the .l file (that is, the input file for lex). This block is defined as: regular expression-- > corresponding operation. If used with yacc, the corresponding operation usually returns a token type, which is defined in advance in yacc.

Write BNF. These things define the way the language is regulated.

About BNF

Is a kind of context-free grammars, please refer to: http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form excerpt:

:: = _ _ expression__

Is a nonterminal

_ _ expression__ consists of one or more sequences of symbols

More sequences are separated by the vertical bar,'|'

Symbols that never appear on a left side are terminals. On the other hand

Symbols that appear on a left side are non-terminals and are always enclosed between the pair.

The way it is defined in yacc is actually:

: _ _ expression__ {operation}

| | _ expression__ {operation} |

Operation is the C language code to be executed when the syntax is satisfied. The C language code here can use some variables, they are: $1 $2 and so on. $represents the result of the specification, which is the value of the expression _ _ expression__, and $1 represents each word that appears in the preceding _ _ expression__. For example:

Expr2:

Expr3 {$$= = $1;}

| | expr2 PLUS expr3 {$$= plus ($1, $3);} |

| | expr2 MINUS expr3 {$$= minus ($1, $3);} |

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report