How to implement Code Compiler with JS 03/28 Update SLTechnology News&Howtos

How to implement Code Compiler with JS

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "how to implement a code compiler with JS". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. Preface

For the front-end students, the compiler may be suitable for the magic box, the surface is ordinary, but often give us a surprise.

The compiler, as its name implies, is used to compile, to compile what? Compile the code, of course?

In fact, we are also often exposed to the use of compilers:

Conversion of JSX to JS Code in React

Convert the code of ES6 and above into ES5 code through Babel

Convert Less / Scss codes into CSS codes supported by browsers through various Loader

Convert TypeScript to JavaScript code.

And so on...

There are so many scenes that I can't count my hands. ?

Although there are many tools in the community that can do the above work for us, it is necessary to understand some compilation principles.

II. Introduction to the compiler

2.1 how the program runs

There are two main compilation modes of modern programs: static compilation and dynamic interpretation. The recommended article "Angular 2 JIT vs AOT" is described in great detail.

Static compilation

Referred to as "AOT" (Ahead-Of-Time), that is, "pre-compile", statically compiled programs will use the designated compiler to compile all the code into machine code before execution.

(picture from: https://segmentfault.com/a/1190000008739157)

The development process of AOT compilation mode in Angular is as follows:

Using TypeScript to develop Angular applications

Run ngc to compile the application

Use Angular Compiler to compile templates and generally output TypeScript code

Run tsc to compile the TypeScript code

Build projects using other tools such as Webpack or Gulp, such as code compression, merging, etc.

Deploy the application

Dynamic interpretation

Referred to as "JIT" (Just-In-Time), that is, "just-in-time compilation", dynamically interpreted programs use the designated interpreter to compile and execute the program at the same time.

(picture from: https://segmentfault.com/a/1190000008739157[1])

The development process of JIT compilation mode in Angular is as follows:

Using TypeScript to develop Angular applications

Run tsc to compile the TypeScript code

Build projects using other tools such as Webpack or Gulp, such as code compression, merging, etc.

Deploy the application

AOT vs JIT

AOT compilation process:

(picture from: https://segmentfault.com/a/1190000008739157)

JIT compilation process:

(picture from: https://segmentfault.com/a/1190000008739157)

Feature AOTJIT compilation platform (Server) server (Browser) browser compile time Build (build phase) Runtime (runtime) smaller package size larger execution performance-shorter startup time-

In addition, AOT has the following advantages:

On the client side, we do not need to import a bulky angular compiler, which can reduce the size of our JS script library.

Applications compiled with AOT no longer contain any HTML fragments and are replaced by compiled TypeScript code, so that the TypeScript compiler can find errors in advance. All in all, with the AOT compilation mode, our template is type-safe.

2.2 Workflow of modern compilers

Excerpt from Wikipedia to introduce the workflow of compiler [2]:

❝A modern compiler's main workflow is as follows: source code (source code) → preprocessor (preprocessor) → compiler (compiler) → assembler (assembler) → object code (object code) → linker (linker) → executable (executables), finally, the packaged file can be read and run by the computer. ❞

The role of the compiler is emphasized here: "take the original program as input and translate to produce the equivalent program of the target language".

Three core phases of the compiler. PNG

At present, the workflow of most modern compilers is basically similar, including three core phases:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

"Parsing": parses the original code string into an "abstract grammar tree (Abstract Syntax Tree)" through lexical analysis and syntax analysis

"Transformation": transforms the abstract syntax tree

Generate Code (Code Generation): generates a target language code string from the converted AST object.

III. Compiler implementation

This article will learn how to implement a lightweight compiler through "The Super Tiny Compiler [3]" source code interpretation, and finally "compile the following original code strings (Lisp-style function calls) into JavaScript executable code".

Lisp style (pre-compilation) JavaScript style (post-compilation) 2 + 2 (add 22) add (2,2) 4-2 (subtract 42) subtract (4,2) 2 + (4-2) (add 2 (subtract 42)) add (2, subtract (4,2))

The Super Tiny Compiler claims to be "probably the smallest compiler ever," and its author, James Kyle, is one of the active maintainers of Babel.

Let's get started.

3.1 The Super Tiny Compiler Workflow

Now take a look at the core workflow of the The Super Tiny Compiler compiler against the three core phases of the previous compiler:

The detailed process in the figure is as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Execute the entry function and enter the original code string as a parameter

/ / original code string (add 2 (subtract 42))

two。 Entering the "parsing phase (Parsing)", the original code string is converted into "lexical unit array" through "lexical analyzer (Tokenizer)", and then "lexical unit array" is converted into "abstract grammar tree (Abstract Syntax Tree referred to as AST)" through "lexical analyzer (Parser)", and returns

3. Enter the "Transformation", import the "AST object" generated in the previous step into the "Transformer", and convert the code into the "new AST object" we need through the "Traverser" in the "Converter".

4. Enter the "code generation phase (Code Generation)" and convert the "new AST object" returned in the previous step to "JavaScript Code" through the "code generator (CodeGenerator)".

5. "end of code compilation" and return "JavaScript Code".

After reading the above process, you may look confused, but it's okay, please keep a clear head, first have an impression of the whole process, and then let's read the code:

3.2 entry method

First, an entry method compiler is defined, which takes the original code string as a parameter and returns the final JavaScript Code:

/ / Compiler entry method parameters: original code string input function compiler (input) {let tokens = tokenizer (input); let ast = parser (tokens); let newAst = transformer (ast); let output = codeGenerator (newAst); return output;}

3.3 Analysis phase

In the parsing phase, we define the "lexical analyzer method" tokenizer and the "parser method" parser and implement them respectively:

/ / Lexer parameter: original code string input function tokenizer (input) {}; / / Parser parameter: lexical unit array tokens function parser (tokens) {}

Lexical analyzer

The main task of the lexical analyzer method tokenizer is to traverse the entire original code string, convert the original code string to a lexical unit array (tokens), and return.

During traversal, each character is matched and processed as a "lexical unit" pressed into an "lexical unit array". For example, when matching to the left parenthesis (()), a "lexical unit object" ({type: 'paren', value:' ('}) is pressed into the "lexical unit array (tokens)".

/ / Lexer parameters: original code string input function tokenizer (input) {let current = 0; / / current parsed character index as cursor let tokens = []; / / initialize lexical unit array / / loop through the original code string and read lexical unit array while (current)

< input.length) { let char = input[current]; // 匹配左括号，匹配成功则压入对象 {type: 'paren', value:'('} if (char === '(') { tokens.push({ type: 'paren', value: '(' }); current++; continue; // 自增current，完成本次循环，进入下一个循环 } // 匹配右括号，匹配成功则压入对象 {type: 'paren', value:')'} if (char === ')') { tokens.push({ type: 'paren', value: ')' }); current++; continue; } // 匹配空白字符，匹配成功则跳过 // 使用 \s 匹配，包括空格、制表符、换页符、换行符、垂直制表符等 let WHITESPACE = /\s/; if (WHITESPACE.test(char)) { current++; continue; } // 匹配数字字符，使用 [0-9]：匹配 // 匹配成功则压入{type: 'number', value: value} // 如 (add 123 456) 中 123 和 456 为两个数值词法单元 let NUMBERS = /[0-9]/; if (NUMBERS.test(char)) { let value = ''; // 匹配连续数字，作为数值 while (NUMBERS.test(char)) { value += char; char = input[++current]; } tokens.push({ type: 'number', value }); continue; } // 匹配形双引号包围的字符串 // 匹配成功则压入 { type: 'string', value: value } // 如 (concat "foo" "bar") 中 "foo" 和 "bar" 为两个字符串词法单元 if (char === '"') { let value = ''; char = input[++current]; // 跳过左双引号 // 获取两个双引号之间所有字符 while (char !== '"') { value += char; char = input[++current]; } char = input[++current];// 跳过右双引号 tokens.push({ type: 'string', value }); continue; } // 匹配函数名，要求只含大小写字母，使用 [a-z] 匹配 i 模式 // 匹配成功则压入 { type: 'name', value: value } // 如 (add 2 4) 中 add 为一个名称词法单元 let LETTERS = /[a-z]/i; if (LETTERS.test(char)) { let value = ''; // 获取连续字符 while (LETTERS.test(char)) { value += char; char = input[++current]; } tokens.push({ type: 'name', value }); continue; } // 当遇到无法识别的字符，抛出错误提示，并退出 thrownewTypeError('I dont know what this character is: ' + char); } // 词法分析器的最后返回词法单元数组 return tokens; } 语法分析器「语法分析器方法」 parser 的主要任务：将「词法分析器」返回的「词法单元数组」，转换为能够描述语法成分及其关系的中间形式（「抽象语法树 AST」）。 // 语法分析器参数：词法单元数组tokens function parser(tokens) { let current = 0; // 设置当前解析的词法单元的索引，作为游标 // 递归遍历（因为函数调用允许嵌套），将词法单元转成 LISP 的 AST 节点 function walk() { // 获取当前索引下的词法单元 token let token = tokens[current]; // 数值类型词法单元 if (token.type === 'number') { current++; // 自增当前 current 值 // 生成一个 AST节点 'NumberLiteral'，表示数值字面量 return { type: 'NumberLiteral', value: token.value, }; } // 字符串类型词法单元 if (token.type === 'string') { current++; // 生成一个 AST节点 'StringLiteral'，表示字符串字面量 return { type: 'StringLiteral', value: token.value, }; } // 函数类型词法单元 if (token.type === 'paren' && token.value === '(') { // 跳过左括号，获取下一个词法单元作为函数名 token = tokens[++current]; let node = { type: 'CallExpression', name: token.value, params: [] }; // 再次自增 current 变量，获取参数词法单元 token = tokens[++current]; // 遍历每个词法单元，获取函数参数，直到出现右括号"）" while ((token.type !== 'paren') || (token.type === 'paren' && token.value !== ')')) { node.params.push(walk()); token = tokens[current]; } current++; // 跳过右括号 return node; } // 无法识别的字符，抛出错误提示 thrownewTypeError(token.type); } // 初始化 AST 根节点 let ast = { type: 'Program', body: [], }; // 循环填充 ast.body while (current < tokens.length) { ast.body.push(walk()); } // 最后返回ast return ast; } 3.4 转换阶段在转换阶段中，定义了转换器 transformer 函数，使用词法分析器返回的 LISP 的 AST 对象作为参数，将 AST 对象转换成一个新的 AST 对象。为了方便代码组织，我们定义一个遍历器 traverser 方法，用来处理每一个节点的操作。 // 遍历器参数：ast 和 visitor function traverser(ast, visitor) { // 定义方法 traverseArray // 用于遍历 AST节点数组，对数组中每个元素调用 traverseNode 方法。 function traverseArray(array, parent) { array.forEach(child =>

{traverseNode (child, parent);});} / define method traverseNode / / for processing each AST node, accepting a node and its parent parent as parameters function traverseNode (node, parent) {/ / get the object let methods = visitor [node.type] of the corresponding method on the visitor / / get the enter method of visitor to handle the current node if (methods & & methods.enter) {methods.enter (node, parent);} switch (node.type) {/ / root node case'Program': traverseArray (node.body, node); break / / function calls case'CallExpression': traverseArray (node.params, node); break; / / numeric value and string, ignore case'NumberLiteral': case'StringLiteral': break; / / when unrecognized characters are encountered, throw an error message, and exit default: thrownewTypeError (node.type) } if (methods & & methods.exit) {methods.exit (node, parent);}} / / execute for the first time, start traversing traverseNode (ast, null);}

When looking at the "traversal" traverser method, it is recommended that you read in conjunction with the "converter" transformer method described below:

/ / Converter, parameter: ast function transformer (ast) {/ / create newAST, similar to the previous AST, Program: as the root node of the new AST let newAst = {type: 'Program', body: [],}; / / maintain the old and new AST through _ context, note that _ context is a reference, from the old AST to the new AST. Ast._context = newAst.body; / / deal with the old AST traverser (ast, {/ / numeric value, insert the new AST directly as is, type name NumberLiteral NumberLiteral: {enter (node, parent) {parent._context.push ({type: 'NumberLiteral', value: node.value,}) },}, / / string, insert the new AST directly as is, type name StringLiteral StringLiteral: {enter (node, parent) {parent._context.push ({type: 'StringLiteral', value: node.value,}) },}, / / function call CallExpression: {enter (node, parent) {/ / create different AST nodes let expression = {type: 'CallExpression', callee: {type:' Identifier', name: node.name,} Arguments: [],} / / function calls have subclasses, and the node correspondence is established. Node._context = expression.arguments is used for child nodes. / / the top-level function call is a statement, wrapped as a special AST node if (parent.type! = = 'CallExpression') {expression = {type:' ExpressionStatement', expression: expression,};} parent._context.push (expression);},}}); return newAst;}

Importantly, the old and new AST objects are "maintained" through the _ context reference, which is easy to manage and avoid contaminating the old AST objects.

3.5 Code Generation

Next, in the final step, we define the "code generator" codeGenerator method, which converts the new AST object code into a JavaScript executable code string through recursion.

/ / Code generator parameter: the new AST object function codeGenerator (node) {switch (node.type) {/ / traverses the nodes in the body attribute, and recursively calls codeGenerator to output the result case'Program': return node.body.map (codeGenerator) .join ('\ n') by line / / expression, which processes the content of the expression and ends with a semicolon case'ExpressionStatement': return (codeGenerator (node.expression) +';') / / function call, add left and right parentheses, parameters separated by commas case'CallExpression': return (codeGenerator (node.callee) +'('+ node.arguments.map (codeGenerator) .join (',) +)'); / / identifier, return its name case'Identifier': return node.name / / numeric value, return its value case'NumberLiteral': return node.value; / / string, wrap it in double quotes and output case'StringLiteral': return' "'+ node.value +'"'; / / throw an error prompt when you encounter unrecognized characters and exit default: thrownewTypeError (node.type);}}

3.6 Compiler testing

As of the last step, we have completed the code development of the simple compiler. Next, through the code of the original requirements above, test how the compiler works:

Const add = (a, b) = > a + b; const subtract = (a, b) = > a-b; const source = "(add 2 (subtract 42))"; const target = compiler (source); / / "add (2, (subtract (4,2));" const result = eval (target); / / Ok result is 4

3.7 Workflow summary

Summarize the entire workflow of the The Super Tiny Compiler compiler:

"1. Input = > tokenizer = > tokens"

"2. Tokens = > parser = > ast"

"3. Ast = > transformer = > newAst"

4. NewAst = > generator = > output "

In fact, most compilers have roughly the same workflow:

Handwritten Webpack compiler

According to the core workflow of the The Super Tiny Compiler compiler introduced earlier, and the handwritten Webpack compiler, it will make you feel silky.

In other words, some interviewers like to ask this. Of course, writing by hand will give us a better understanding of the process of building Webpack, and we'll give you a brief introduction to this chapter.

4.1Analysis of Webpack construction process

A series of processes from starting the build to outputting the results:

1. Initialization parameters

Parse the Webpack configuration parameters, merge the Shell input and webpack.config.js file configuration parameters to form the final configuration result.

2. "start compilation"

The parameters obtained in the previous step initialize the compiler object, register all configured plug-ins, and the plug-in listens to the event node of the Webpack construction life cycle, responds accordingly, and executes the run method of the object to execute compilation.

3. "confirm the entrance"

From the configured entry entry, start parsing the file to build the AST syntax tree, find out the dependencies, and recursively.

4. "compiler module"

In recursion, according to the "file type" and "loader configuration", call all configured loader to convert files, then find out the modules that the module depends on, and then recurse this step until all the files that the entry depends on have been processed by this step.

5 "complete module compilation and output"

After the recursion is finished, we get the results of each file, including each module and their dependencies, and generate a code block chunk according to the entry configuration.

6. "output complete"

Output all chunk to the file system.

Note: during the build life cycle, there are a series of plug-ins that do the right thing at the right time, such as UglifyPlugin will use UglifyJs compression to "overwrite the results" at the end of the loader transformation recursion.

4.2 Code implementation

Handwritten Webpack needs to implement the following three core methods:

CreateAssets: code that collects and processes files

CreateGraph: returns all file dependency graphs based on the entry file

Bundle: according to the dependency graph, the whole code and output

1. CreateAssets

Function createAssets (filename) {const content = fs.readFileSync (filename, "utf-8"); / / read the contents of the file according to the file name / / convert the contents of the read code to AST const ast = parser.parse (content, {sourceType: "module" / / specify source type}) const dependencies = [] / / collect the dependent path of the file / / obtain the dependent path traverse of each node (ast, {ImportDeclaration: ({node}) = > {dependencies.push (node.source.value);}}) through the method of operating AST provided by traverse / convert ES6 code to ES5 code const {code} = babel.transformFromAstSync (ast, null, {presets: ["@ babel/preset-env"]}) through AST; let id = moduleId++; return {id, filename, code, dependencies}}

2. CreateGraph

Function createGraph (entry) {const mainAsset = createAssets (entry); / / get the contents under the entry file const queue = [mainAsset]; for (const asset of queue) {const dirname = path.dirname (asset.filename); asset.mapping = {}; asset.dependencies.forEach (relativePath = > {const absolutePath = path.join (dirname, relativePath)) / / convert the file path to the absolute path const child = createAssets (absolutePath); asset.mapping [relativePath] = child.id; queue.push (child); / / Recursively traverse the files of all child nodes})} return queue;}

3. Bunlde

Function bundle (graph) {let modules = "" Graph.forEach (item = > {modules + = `${item.id}: [function (require, module, exports) {${item.code}}, ${JSON.stringify (item.mapping)}] `}) roomn` (function (modules) {function require (id) {const [fn, mapping] = modules [id] Function localRequire (relativePath) {return require (mapping [relativePath]);} const module = {exports: {}} fn (localRequire, module, module.exports); return module.exports } require (0);}) ({${modules}}) `} "how to implement a code compiler with JS" ends here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.