Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Source code analysis of Node.js module system

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "Node.js module system source code analysis". In daily operation, I believe many people have doubts about Node.js module system source code analysis. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "Node.js module system source code analysis". Next, please follow the editor to study!

CommonJS specification

Node initially followed the CommonJS specification to implement its own module system, and made some customizations that are different from the specification. The CommonJS specification is a form of modules defined to solve the scope problem of JavaScript, which enables each module to execute in its own namespace.

The specification emphasizes that modules must export external variables or functions through module.exports, import the output of other modules into the current module scope through require (), and follow the following conventions:

In the module, you must expose a require variable, which is a function, the require function accepts a module identifier, and the require returns the exported API of the external module. If the required module cannot be returned, require must throw an error.

In the module, there must be a free variable called exports, which is an object, and the module can mount the properties of the module on the exports when executing. The module must use the exports object as the only way to export.

In the module, there must be a free variable module, which is also an object. The module object must have an id property, which is the top-level id of the module. The id attribute must be such that require (module.id) returns the exports object from the module that originated the module.id (that is, the module.id can be passed to another module and must return the original module when requested).

Implementation of CommonJS Specification by Node

The module.require function inside the module and the global require function are defined to load the module.

In the Node module system, each file is treated as a separate module. When a module is loaded, it is initialized to an instance of the Module object. The basic implementation and properties of the Module object are as follows:

Function Module (id = "", parent) {/ / module id, usually the absolute path of the module this.id = id; this.path = path.dirname (id); this.exports = {}; / / current module caller this.parent = parent; updateChildren (parent, this, false); this.filename = null; / / whether the module is loaded or not this.loaded = false / / the module referenced by the current module this.children = [];}

Each module exposes its own exports attribute as a usage interface.

Module exports and references

In Node, you can use the module.exports object to export a variable or function as a whole, or you can mount the variable or function to be exported to the properties of the exports object. The code is as follows:

/ / 1. Using exports: the author is used to deriving exports.name = 'xiaoxiang'; exports.add = (a, b) = > a + b; / / 2 for toollibrary functions or constants. Use module.exports: export an entire object or a single function. Module.exports = {add, minus}

The module name, relative path or absolute path can be passed in through the global require function. When the module file suffix is js / json / node, the suffix can be omitted, as shown in the following code:

/ / reference module const {add, minus} = require ('. / module'); const a = require ('/ usr/app/module'); const http = require ('http')

Note:

The exports variable is available in the file-level scope of the module and is assigned to module.exports before the module executes.

Exports.name = 'test'; console.log (module.exports.name); / / test module.export.name =' test'; console.log (exports.name); / / test

If a new value is assigned to exports, it will no longer be bound to module.exports, and vice versa:

Exports = {name: 'test'}; console.log (module.exports.name, exports.name); / / undefined, test

When the module.exports property is completely replaced by the new object, you usually need to reassign the value exports:

Module.exports = exports = {name: 'test'}; console.log (module.exports.name, exports.name) / / test, test

Analysis on the implementation of Module system

Module positioning

The following is the code implementation of the require function:

/ / require entry function Module.prototype.require = function (id) {/ /... RequireDepth++; try {return Module._load (id, this, / * isMain * / false); / / load module} finally {requireDepth--;}}

The above code receives the given module path, where the requireDepth is used to record the depth of the module load. Among them, the Module class method _ load implements the main logic of the Node loading module. Let's parse the source code implementation of the Module._load function. In order to facilitate your understanding, I added comments to the article.

Module._load = function (request, parent, isMain) {/ / step 1: parse the full path of the module const filename = Module._resolveFilename (request, parent, isMain); / / step 2: load the module and deal with it in three cases: if there is a cache, return the module's exports attribute const cachedModule = Module._ cache[ filename] directly; if (cachedModule! = = undefined) return cachedModule.exports / / case 2: load the built-in module const mod = loadNativeModule (filename, request); if (mod & & mod.canBeRequiredByUsers) return mod.exports; / / case 3: build the module load const module = new Module (filename, parent); / / load the module instance cache Module._ [filename] = module; / / step 3: load the module file module.load (filename) / / step 4: return the exported object return module.exports;}

Loading strategy

The above code has a large amount of information, so we mainly look at the following questions:

What is the caching strategy for the module?

By analyzing the above code, we can see that the _ load load function gives different loading strategies for three cases, which are:

Case 1: cache hit and return directly.

Case 2: a built-in module that returns the exposed exports property, that is, the alias of module.exports.

Case 3: use a file or third-party code to generate the module, and finally return, and cache, so that the next time the same access will use the cache instead of reloading.

2. How does Module._resolveFilename (request, parent, isMain) resolve the file name?

Let's look at the class methods defined as follows:

Module._resolveFilename = function (request, parent, isMain, options) {if (NativeModule.canBeRequiredByUsers (request)) {/ / priority load built-in module return request;} let paths / / the options,options.paths used by the node require.resolve function is used to specify the lookup path if (typeof options = = "object" & & options! = = null) {if (ArrayIsArray (options.paths)) {const isRelative = request.startsWith (". /") | | request.startsWith (".. /") | (isWindows & & request.startsWith (".\\") | request.startsWith ("..\") If (isRelative) {paths = options.paths;} else {const fakeParent = new Module ("", null); paths = []; for (let I = 0; I

< options.paths.length; i++) { const path = options.paths[i]; fakeParent.paths = Module._nodeModulePaths(path); const lookupPaths = Module._resolveLookupPaths(request, fakeParent); for (let j = 0; j < lookupPaths.length; j++) { if (!paths.includes(lookupPaths[j])) paths.push(lookupPaths[j]); } } } } else if (options.paths === undefined) { paths = Module._resolveLookupPaths(request, parent); } else { //... } } else { // 查找模块存在路径 paths = Module._resolveLookupPaths(request, parent); } // 依据给出的模块和遍历地址数组,以及是否为入口模块来查找模块路径 const filename = Module._findPath(request, paths, isMain); if (!filename) { const requireStack = []; for (let cursor = parent; cursor; cursorcursor = cursor.parent) { requireStack.push(cursor.filename || cursor.id); } // 未找到模块,抛出异常(是不是很熟悉的错误) let message = `Cannot find module '${request}'`; if (requireStack.length >

0) {messagemessage = message + "\ nRequire stack:\ n -" + requireStack.join ("\ n -");} const err = new Error (message); err.code = "MODULE_NOT_FOUND"; err.requireStack = requireStack; throw err;} / / finally returns the full path return filename; containing the file name

What stands out in the above code is the use of the _ resolveLookupPaths and _ findPath methods.

_ resolveLookupPaths: returns an array of traversal ranges used by _ findPath by accepting the module name and the module caller.

/ / the address array method of module file addressing Module._resolveLookupPaths = function (request, parent) {if (NativeModule.canBeRequiredByUsers (request)) {debug ("looking for% j in []", request); return null } / / if it is not the relative path if (request.charAt (0)! = = "." | (request.length > 1 & & request.charAt (1)! = = "." & & request.charAt (1)! = = "/" & isWindows | | request.charAt (1)! = = "\\")) { / * check the node_modules folder * modulePaths is the user directory The node_path environment variable specifies the directory, global node installation directory * / let paths = modulePaths If (parent! = null & & parent.paths & & parent.paths.length) {/ / the modulePath of the parent module is also added to the modulePath of the child module, backtracking to find paths = parent.paths.concat (paths);} return paths.length > 0? Paths: null;} / / when using repl interaction, look for. /. / node_modules and modulePaths if (! parent | |! parent.id | |! parent.filename) {const mainPaths = ["."] .concat (Module._nodeModulePaths ("."), modulePaths); return mainPaths } / / if the relative path is introduced, add the parent folder path to the lookup path const parentDir = [path.dirname (parent.filename)]; return parentDir;}

_ findPath: find the corresponding filename and return it according to the scope found by the target module and the above function.

/ / find the real path of the module Module._findPath = function (request, paths, isMain) {const absoluteRequest = path.isAbsolute (request); if (absoluteRequest) {/ / absolute path according to the given module and traversal address array, and locate directly to the specific module paths = ["];} else if (! paths | | paths.length = 0) {return false } const cacheKey = request + "\ x00" + (paths.length = 1? Paths [0]: paths.join ("\ x00")); / / cache path const entry = Module._ pathCache [cacheKey]; if (entry) return entry; let exts; let trailingSlash = request.length > 0 & & request.charCodeAt (request.length-1) = CHAR_FORWARD_SLASH; /'/'if (! test) {trailingSlash = / (?: ^ |\. $/ .test (request) } / / For each path for (let I = 0; I

< paths.length; i++) { const curPath = paths[i]; if (curPath && stat(curPath) < 1) continue; const basePath = resolveExports(curPath, request, absoluteRequest); let filename; const rc = stat(basePath); if (!trailingSlash) { if (rc === 0) { // stat 状态返回 0,则为文件 // File. if (!isMain) { if (preserveSymlinks) { // 当解析和缓存模块时,命令模块加载器保持符号连接。 filename = path.resolve(basePath); } else { // 不保持符号链接 filename = toRealPath(basePath); } } else if (preserveSymlinksMain) { filename = path.resolve(basePath); } else { filename = toRealPath(basePath); } } if (!filename) { if (exts === undefined) exts = ObjectKeys(Module._extensions); // 解析后缀名 filename = tryExtensions(basePath, exts, isMain); } } if (!filename && rc === 1) { /** * stat 状态返回 1 且文件名不存在,则认为是文件夹 * 如果文件后缀不存在,则尝试加载该目录下的 package.json 中 main 入口指定的文件 * 如果不存在,然后尝试 index[.js, .node, .json] 文件 */ if (exts === undefined) exts = ObjectKeys(Module._extensions); filename = tryPackage(basePath, exts, isMain, request); } if (filename) { // 如果存在该文件,将文件名则加入缓存 Module._pathCache[cacheKey] = filename; return filename; } } const selfFilename = trySelf(paths, exts, isMain, trailingSlash, request); if (selfFilename) { // 设置路径的缓存 Module._pathCache[cacheKey] = selfFilename; return selfFilename; } return false; }; 模块加载 标准模块处理 阅读完上面的代码,我们发现,当遇到模块是一个文件夹的时候会执行 tryPackage 函数的逻辑,下面简要分析一下具体实现。 // 尝试加载标准模块 function tryPackage(requestPath, exts, isMain, originalPath) { const pkg = readPackageMain(requestPath); if (!pkg) { // 如果没有 package.json 这直接使用 index 作为默认入口文件 return tryExtensions(path.resolve(requestPath, "index"), exts, isMain); } const filename = path.resolve(requestPath, pkg); let actual = tryFile(filename, isMain) || tryExtensions(filename, exts, isMain) || tryExtensions(path.resolve(filename, "index"), exts, isMain); //... return actual; } // 读取 package.json 中的 main 字段 function readPackageMain(requestPath) { const pkg = readPackage(requestPath); return pkg ? pkg.main : undefined; } readPackage 函数负责读取和解析 package.json 文件中的内容,具体描述如下: function readPackage(requestPath) { const jsonPath = path.resolve(requestPath, "package.json"); const existing = packageJsonCache.get(jsonPath); if (existing !== undefined) return existing; // 调用 libuv uv_fs_open 的执行逻辑,读取 package.json 文件,并且缓存 const json = internalModuleReadJSON(path.toNamespacedPath(jsonPath)); if (json === undefined) { // 接着缓存文件 packageJsonCache.set(jsonPath, false); return false; } //... try { const parsed = JSONParse(json); const filtered = { name: parsed.name, main: parsed.main, exports: parsed.exports, type: parsed.type }; packageJsonCache.set(jsonPath, filtered); return filtered; } catch (e) { //... } } 上面的两段代码完美地解释 package.json 文件的作用,模块的配置入口( package.json 中的 main 字段)以及模块的默认文件为什么是 index,具体流程如下图所示: 模块文件处理 定位到对应模块之后,该如何加载和解析呢?以下是具体代码分析: Module.prototype.load = function(filename) { // 保证模块没有加载过 assert(!this.loaded); this.filename = filename; // 找到当前文件夹的 node_modules this.paths = Module._nodeModulePaths(path.dirname(filename)); const extension = findLongestRegisteredExtension(filename); //... // 执行特定文件后缀名解析函数 如 js / json / node Module._extensions[extension](this, filename); // 表示该模块加载成功 this.loaded = true; // ... 省略 esm 模块的支持 }; 后缀处理 可以看出,针对不同的文件后缀,Node.js 的加载方式是不同的,一下针对 .js, .json, .node 简单进行分析。 .js 后缀 js 文件读取主要通过 Node 内置 API fs.readFileSync 实现。 Module._extensions[".js"] = function(module, filename) { // 读取文件内容 const content = fs.readFileSync(filename, "utf8"); // 编译执行代码 module._compile(content, filename); }; .json 后缀 JSON 文件的处理逻辑比较简单,读取文件内容后执行 JSONParse 即可拿到结果。 Module._extensions[".json"] = function(module, filename) { // 直接按照 utf-8 格式加载文件 const content = fs.readFileSync(filename, "utf8"); //... try { // 以 JSON 对象格式导出文件内容 module.exports = JSONParse(stripBOM(content)); } catch (err) { //... } }; .node 后缀 .node 文件是一种由 C / C++ 实现的原生模块,通过 process.dlopen 函数读取,而 process.dlopen 函数实际上调用了 C++ 代码中的 DLOpen 函数,而 DLOpen 中又调用了 uv_dlopen, 后者加载 .node 文件,类似 OS 加载系统类库文件。 Module._extensions[".node"] = function(module, filename) { //... return process.dlopen(module, path.toNamespacedPath(filename)); }; 从上面的三段源码,我们看出来并且可以理解,只有 JS 后缀最后会执行实例方法 _compile,我们去除一些实验特性和调试相关的逻辑来简要的分析一下这段代码。 编译执行 模块加载完成后,Node 使用 V8 引擎提供的方法构建运行沙箱,并执行函数代码,代码如下所示: Module.prototype._compile = function(content, filename) { let moduleURL; let redirects; // 向模块内部注入公共变量 __dirname / __filename / module / exports / require,并且编译函数 const compiledWrapper = wrapSafe(filename, content, this); const dirname = path.dirname(filename); const require = makeRequireFunction(this, redirects); let result; const exports = this.exports; const thisValue = exports; const module = this; if (requireDepth === 0) statCache = new Map(); //... // 执行模块中的函数 result = compiledWrapper.call( thisValue, exports, require, module, filename, dirname ); hasLoadedAnyUserCJSModule = true; if (requireDepth === 0) statCache = null; return result; }; // 注入变量的核心逻辑 function wrapSafe(filename, content, cjsModuleInstance) { if (patched) { const wrapper = Module.wrap(content); // vm 沙箱运行 ,直接返回运行结果,env ->

SetProtoMethod (script_tmpl, "runInThisContext", RunInThisContext); return vm.runInThisContext (wrapper, {filename, lineOffset: 0, displayErrors: true, / / dynamic loading importModuleDynamically: async specifier = > {const loader = asyncESM.ESMLoader; return loader.import (specifier, normalizeReferrerURL (filename));}} let compiled) Try {compiled = compileFunction (content, filename, 0,0, undefined, false, undefined, [], ["exports", "require", "module", "_ _ filename", "_ _ dirname"]) } catch (err) {/ /...} const {callbackMap} = internalBinding ("module_wrap"); callbackMap.set (compiled.cacheKey, {importModuleDynamically: async specifier = > {const loader = asyncESM.ESMLoader; return loader.import (specifier, normalizeReferrerURL (filename));}}); return compiled.function;}

In the above code, we can see that the wrapwrapSafe function is called in the _ compile function, the injection of _ _ dirname / _ filename / module / exports / require public variables is performed, and the sandbox environment in which the module code runs is built by calling C++ 's runInThisContext method (located in the src/node_contextify.cc file), and the compiledWrapper object is returned, and finally the module is run through the compiledWrapper.call method.

At this point, the study of "Node.js module system source code analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report