How does V8 quickly parse JavaScript delay parsing 04/15 Update SLTechnology News&Howtos

How does V8 quickly parse JavaScript delay parsing

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "how V8 quickly parses JavaScript delayed parsing". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Parsing is the step of converting the source code into an intermediate representation for use by the compiler (in V8, the bytecode compiler Ignition). Parsing and compilation occurs on the critical path where the web page starts, and not all functions provided to the browser need to be called during startup. Although developers can use asynchronous and deferred scripts to delay the loading of this code, this is not always feasible. In addition, the code for many web pages can only be used by specific features, making it impossible for users to access the code while each page is running separately.

Eagerly compiling unnecessary code results in actual resource costs:

Creating this unnecessary code takes up part of the time of CPU, which results in a delay in loading the code actually needed at startup.

Code objects consume memory, at least until the recycling mechanism determines that the current code is no longer needed and allows the garbage collector to collect.

The code compiled at the end of the script execution will eventually be cached on disk, taking up disk space.

For these reasons, all major browsers implement deferred parsing. The previous practice was to generate an abstract grammar tree (AST) for each function and compile it into bytecode, and after using deferred parsing, the parser can "pre-parse" the functions it encounters without having to fully parse them. It does this by switching to the pre-parser, which is a copy of the parser and does only the most basic work, otherwise it will skip the function. The pre-parser verifies that the function it skips is syntactically valid and generates all the information needed to compile the external function correctly. When the pre-parsed function is called later, it will be fully parsed and compiled as needed.

Variable allocation

The main problem that complicates the pre-analysis is the allocation of variables.

For performance reasons, function activation is managed on the machine stack. For example, if function g calls function f with arguments 1 and 2:

The receiver (that is, the this value of f, which is globalThis because it is a hasty function call) is first pushed onto the stack, followed by the called function f. Then push parameters 1 and 2 onto the stack. At this point the function f is called. To execute the call, we first save the state of g on the stack: including the "return instruction pointer" of f (what code do we need to return to rip;) and "frame pointer" (what the stack should look like when fp; returns). Then we enter f, which allocates space for the local variable c, as well as any temporary space it may need. This ensures that any data used by the function disappears when the function activation goes out of scope: it just pops up from the stack.

The stack allocation layout for the call to function f with the parameter aforme b and the local variable c.

The problem with this setting is that functions can reference variables declared in external functions. Internal functions may survive longer than they are activated when they are created:

In the above example, the reference to the variable d declared from inner to make_f is evaluated after make_f returns. To achieve this, the virtual machine of a language that uses lexical closures allocates variables referenced from internal functions on the heap in a structure called context.

Call make_f by copying its parameters into a context, and the stack layout of the call is allocated on the heap for later use by the inner that captures d.

This means that for each variable declared in the function, we need to know whether the internal function references the variable in order to decide whether to allocate the variable on the stack or in the context allocated on the heap. When we calculate the literal quantity of a function, we assign a closure that points to the function's code and the current context: the object that contains the value of the variable that the function may need to access.

To make a long story short, we at least need to track variable references in the pre-parser.

If we only track references, we overestimate the referenced variables. Variables declared in an external function can be hidden by redeclaration in the internal function, creating a reference from the internal function and pointing it to the internal declaration instead of the external declaration. If we assign external variables in the context unconditionally, program performance will be affected. Therefore, in order for variable allocation to correctly handle the pre-parsing process, we need to ensure that the pre-parsed function correctly tracks variable references and declarations.

The top-level code is an exception to this rule. The top level of a script is always allocated on the heap because variables are visible between scripts. A simple way to get close to a good working architecture is to simply run the pre-parser without tracking variables on top-level functions that are parsed quickly; and use full parsers for internal functions, but skip them at compile time. This is more expensive than the pre-parsing process because we don't need to build the entire AST, but it allows us to get up and running. This is exactly what V8 does in the new version V8 v6.3 / Chrome 63.

Explain the variables to the pre-parser

Tracking variable declarations and references in a pre-parser is very complex because in JavaScript, the meaning of some partial expressions is not clear from the beginning. For example, suppose we have a function f with a parameter d, which has an internal function g, which looks like a reference to d from the expression.

It may eventually reference d, because the tokens tag we see is part of the destructor assignment expression.

It may also end up being an arrow function with a destructor d, in which case d in f is not referenced by g.

Initially, our pre-parser was implemented as a separate copy of the parser without much sharing, which caused the two parsers to disagree over time. By rewriting parsers and pre-parsers as ParserBase based on the implementation of singular recursive template patterns, we successfully share while retaining the performance advantages of individual copies. This greatly simplifies the task of adding full variable tracking to the pre-parser because most of the implementation can be shared between the parser and the pre-parser.

In fact, it is incorrect to ignore variable declarations and references to top-level functions. The ECMAScript specification requires that various types of variable conflicts be detected during * parsing scripts. For example, if a variable is declared as a lexical variable twice in the same scope, it is considered early SyntaxError. Because our pre-parser just skips the variable declaration, it will allow the code to run incorrectly during pre-parsing. At this point, we believe that the performance victory makes the violation of the specification justifiable. Now that pre-parsers can track variables correctly, however, we should eliminate such specification violations associated with variable parsing without obvious performance costs.

Skip internal function

As mentioned earlier, when a preparsed function is called * times, it will be fully parsed and the resulting AST will be compiled into bytecode.

This function points directly to the external context, which contains the value of the variable declaration that the internal function needs to use. To allow delayed compilation of functions (and support for debuggers), the text above and below points to a metadata object named ScopeInfo. The ScopeInfo object describes the variables listed in the context. This means that when compiling the internal function, we can calculate the position of the variable in the context chain.

However, to calculate whether the deferred compiled function itself requires context, we need to perform scope parsing again: we need to know whether the function nested in the deferred compiled function refers to the variable declared by the deferred function. We can calculate it by reparsing these functions. This is exactly what V8 did before upgrading to V8v6.3/Chrome63. However, this is not an ideal performance approach because it makes the relationship between resource size and parsing cost nonlinear: we will parse as many nested functions as possible. In addition to the natural nesting of dynamic programs, JavaScript Packers usually wrap code in the form of "immediate call function expressions" (IIFEs), which makes most JavaScript programs have multiple nesting layers.

Each reparse will at least increase the cost of the parsing function.

In order to avoid nonlinear performance overhead, we even perform full-scope parsing during the pre-parsing process. We have stored enough metadata so that we can simply skip internal functions later without having to reparse them. One way is to store variable names referenced by internal functions. The storage cost of doing so is high and requires us to repeat the work: we have performed variable parsing during pre-parsing.

Instead, we will serialize each variable into its dense tag array where the variable is assigned. When we delay parsing a function, the variables are recreated in the order seen by the pre-parser, and we can simply apply metadata to those variables. Now that the function has been compiled, variable allocation metadata is no longer needed so that it can be recycled as garbage. Because we only need this metadata to handle functions that actually contain internal functions, most functions don't even need this metadata, which significantly reduces memory overhead.

By tracking the metadata of the pre-parsed function, we can skip the internal function completely.

The performance impact of skipping internal functions is nonlinear, just like the cost of re-parsing internal functions. Some sites have promoted all their functions to the top-level scope. Because their nesting levels are always 0, the cost is always 0. However, many modern sites actually have many deeply nested functions. When V8 v6.3 / Chrome 63 launches this feature, we will see significant improvements on these sites. The main advantage of enabling this feature is that the nesting depth of the code now doesn't matter: any function can only be pre-parsed once at most and fully parsed once [1].

The parsing time of main and non-main threads, as well as before and after running "skip internal functions" are optimized.

Call function expressions at any time

As mentioned earlier, Packers typically combine multiple modules into one file by encapsulating module code in a closure that they call instantly. This provides isolation for modules, allowing them to run like the only code in the script. These functions are essentially nested scripts; they are called immediately when the script is executed. Packers are usually parenthesized functions, that is, (function () {... }) (), which provides an instant calling function expression (IIFEs, pronounced "iffies").

Because these functions are needed immediately during script execution, pre-parsing these functions is not ideal. During the top-level execution of the script, we urgently need these functions to be compiled, so we will fully parse and compile these functions. This means that the faster we parse up front, the faster the code starts at run time, without unnecessary additional costs.

You might ask, why not compile the called function directly? Although developers can easily notice a function when it is called, it is different for parsers. The parser needs to decide whether the function needs to be compiled immediately or deferred before starting to parse the function. The ambiguity in syntax makes it difficult to scan to the end of a function simply and quickly, and the cost is quickly the same as that of regular preparsing.

So V8 has two simple patterns that recognize a function as calling a function expression (PIFEs, pronounced "piffies") at any time, so it quickly parses and compiles a function:

If a function is a parenthesized function expression, that is, (function () {... }), we assume that it will be called We make this assumption as soon as we see the beginning of this pattern, namely (function).

We also detected the pattern generated by UglifyJS in V8 v5.7 / Chrome 57! function () {... } (), function () {… } (), function () {… } (). Once we see! function or function followed by a PIFE, then the test works.

Because V8 compiles PIFEs immediately, they can be used as configuration file-oriented feedback [2] to tell browsers which functions are needed to start.

While V8 is still pre-parsing internal functions, some developers have noticed that JS parsing has a considerable impact on startup. The optimize-js package converts functions to PIFEs based on static heuristics. The creation of this package has a great impact on the load performance of V8. We copied these results by running the benchmark provided by optimize-js on V8 v6.1, and you only need to look at the scaled-down script.

Eagerly parsing and compiling PIFEs results in a slightly faster cold start and warm start (* * and second page load, measuring the total parsing + compilation + execution time). However, due to the significant improvements to the parser, the benefits on V8 v7.5 are much smaller than those used on V8 v6.1.

Nonetheless, we no longer need to reparse internal functions, and the performance improvements achieved through optimize-js are greatly reduced as the parser becomes faster. In fact, the default configuration of v7.5 is already much faster than the optimized version running on v6.1. Even in v7.5, a small amount of PIFEs is useful for the code needed during startup: we avoid pre-parsing because we knew a long time ago that we would need this function.

The optimize-js benchmark test results do not accurately reflect the actual situation. The script is loaded synchronously, and the entire parsing + compilation time is counted as the load time. In the real world, you might use tags to load scripts. This allows Chrome's preloader to discover the script before it is evaluated and download, parse, and compile the script without blocking the main thread. Everything we decide to compile eagerly is automatically compiled outside the main thread, which ensures that the value that takes into account the startup time is minimized. Using non-main thread script compilation to run amplifies the impact of using PIFEs.

However, there are still costs, especially memory costs, so eagerly compiling everything is not a good idea:

Eagerly compiling all JavaScript comes at a huge memory cost.

While it's a good idea to add parentheses to required functions during startup (for example, configuration-based startup), it's not a good idea to use packages like optimize-js to apply simple static heuristics. For example, it assumes that a function is called during startup if it is an argument to a function call. However, if such a function implements a complete module that only takes a long time, it will end up compiling too much. Compiling too eagerly is not good for performance: V8 without delayed compilation significantly reduces load time. In addition, when UglifyJS and other minifiers (minimizer) remove parentheses from PIFEs that is not IIFEs, they remove useful hints that could have been applied to generic module definition style modules, thus causing problems with some of the benefits of optimize-js. This may be a problem that minifiers should fix in order to achieve * performance on browsers that are eager to compile PIFEs.

This is the end of the content of "how V8 quickly parses JavaScript delayed parsing". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.