In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article focuses on "Why not block event loop in nodejs". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "Why not block event loop in nodejs"!
Brief introduction
We know that event loop is the basis of event handling in nodejs, and the main initialization and callback events that run in event loop. In addition to event loop, there is also a Worker Pool in nodejs to handle time-consuming operations, such as the Imax O operation.
The secret of nodejs running efficiently is to use asynchronous IO so that a small number of threads can be used to handle a large number of client requests.
At the same time, because a small number of threads are used, we must be very careful when writing nodejs programs.
Event loop and worker pool
There are two types of threads in nodejs. The first type of thread is Event Loop, which can also be called the main thread, and the second type is n Workers threads in a Worker Pool.
If these two threads spend too much time executing the callback, then we can assume that the two threads are blocked.
The first aspect of thread blocking will affect the performance of the program, because some threads are blocked, which will lead to the consumption of system resources. Because the total resources are limited, this will result in fewer resources to handle other business, thus affecting the overall performance of the program.
Second, if there is frequent thread blocking, it is very likely that a malicious attacker will launch a DOS attack, resulting in the failure of normal business.
Nodejs uses an event-driven framework, and Event Loop is mainly used to handle callback registered for various events, as well as non-blocking asynchronous requests, such as the network Imaco.
The Worker Pool implemented by libuv mainly exposes the API submitted to the task, which is used to handle some of the more expensive task tasks. These tasks include CPU-intensive operations and some blocking IO operations.
Nodejs itself has many modules that use Worker Pool.
For example, IO-intensive operations:
Dns.lookup (), dns.lookupService () in the DNS module.
And in addition to fs.FSWatcher () and the API of the explicitly synchronized file system, many other File system modules use Worker Pool.
CPU-intensive operations:
Crypto module: crypto.pbkdf2 (), crypto.scrypt (), crypto.randomBytes (), crypto.randomFill (), crypto.generateKeyPair ().
Zlib module: except for displaying synchronized API, all other API uses worker pool.
Generally speaking, these are the modules that use Worker Pool. In addition, you can also use nodejs's C++ add-on to automatically submit tasks to Worker Pool.
Queue in event loop and worker pool
In the previous file, we talked about using queue to store the callback of event in event loop, but this description is actually inaccurate.
What event loop actually maintains is a collection of file descriptors. These file descriptors use the operating system kernel's epoll (Linux), kqueue (OSX), event ports (Solaris), or IOCP (Windows) to listen for events.
When the operating system detects that the event is ready, event loop invokes the callback event bound by event and finally executes callback.
In contrast, worker pool actually holds the queues of tasks to be executed, and the tasks in these queues are executed by individual worker. When execution is complete, Woker will notify Event Loop that the task has been completed.
Blocking event loop
Because the thread in nodejs is limited, if a thread is blocked, it may affect the execution of the whole application, so we must carefully consider event loop and worker pool in the process of programming to avoid blocking them.
Event loop is mainly concerned with the user's connection and response to the user's request. If the event loop is blocked, the user's request will not be answered in a timely manner.
Because event loop mainly executes callback, our callback execution time must be short.
Time complexity of event loop
Time complexity is generally used to judge the running speed of an algorithm, here we can also use the concept of time complexity to analyze the callback in event loop.
If the time complexity of all callback is constant, then we can ensure that all callback can be executed fairly.
But if the time complexity of some callback is variable, then we need to consider it carefully.
App.get ('/ constant-time', (req, res) = > {res.sendStatus;})
Let's first look at a case of constant time complexity. In the above example, we directly set the status of respose, which is a constant time operation.
App.get ('/ countToN', (req, res) = > {let n = req.query.n; / / n iterations before giving someone else a turn for (let I = 0; I)
< n; i++) { console.log(`Iter ${i}`); } res.sendStatus(200);}); 上面的例子是一个O(n)的时间复杂度,根据request中传入的n的不同,我们可以得到不同的执行时间。 app.get('/countToN2', (req, res) =>{let n = req.query.n; / / n ^ 2 iterations before giving someone else a turn for (let I = 0; I
< n; i++) { for (let j = 0; j < n; j++) { console.log(`Iter ${i}.${j}`); } } res.sendStatus(200);}); 上面的例子是一个O(n^2)的时间复杂度。 这种情况应该怎么处理呢?首先我们需要估算出系统能够承受的响应极限值,并且设定用户传入的参数极限值,如果用户传入的数据太长,超出了我们的处理范围,则可以直接从用户输入端进行限制,从而保证我们的程序的正常运行。 Event Loop中不推荐使用的Node.js核心模块 在nodejs中的核心模块中,有一些方法是同步的阻塞API,使用起来开销比较大,比如压缩,加密,同步IO,子进程等等。 这些API的目的是供我们在REPL环境中使用的,我们不应该直接在服务器端程序中使用他们。 有哪些不推荐在server端使用的API呢? Encryption: crypto.randomBytes (同步版本) crypto.randomFillSync crypto.pbkdf2Sync Compression: zlib.inflateSync zlib.deflateSync File system: 不要使用fs的同步API Child process: child_process.spawnSync child_process.execSync child_process.execFileSync partitioning 或者 offloading 为了不阻塞event loop,同时给其他event一些运行机会,我们实际上有两种解决办法,那就是partitioning和offloading。 partitioning就是分而治之,把一个长的任务,分成几块,每次执行一块,同时给其他的event一些运行时间,从而不再阻塞event loop。 举个例子: for (let i = 0; i < n; i++) sum += i;let avg = sum / n;console.log('avg: ' + avg); 比如我们要计算n个数的平均数。上面的例子中我们的时间复杂度是O(n)。 function asyncAvg(n, avgCB) { // Save ongoing sum in JS closure. var sum = 0; function help(i, cb) { sum += i; if (i == n) { cb(sum); return; } // "Asynchronous recursion". // Schedule next operation asynchronously. setImmediate(help.bind(null, i+1, cb)); } // Start the helper, with CB to call avgCB. help(1, function(sum){ var avg = sum/n; avgCB(avg); });}asyncAvg(n, function(avg){ console.log('avg of 1-n: ' + avg);}); 这里我们用到了setImmediate,将sum的任务分解成一步一步的。虽然asyncAvg需要执行很多次,但是每一次的event loop都可以保证不被阻塞。 partitioning虽然逻辑简单,但是对于一些大型的计算任务来说,并不合适。并且partitioning本身还是运行在event loop中的,它并没有享受到多核系统带来的优势。 这个时候我们就需要将任务offloading到worker Pool中。 使用Worker Pool有两种方式,第一种就是使用nodejs自带的Worker Pool,我们可以自行开发C++ addon或者node-webworker-threads。 第二种方式就是自行创建Worker Pool,我们可以使用Child Process 或者 Cluster来实现。 当然offloading也有缺点,它的最大缺点就是和Event Loop的交互损失。 V8引擎的限制 nodejs是运行在V8引擎上的,通常来说V8引擎已经足够优秀足够快了,但是还是存在两个例外,那就是正则表达式和JSON操作。 REDOS正则表达式DOS攻击 正则表达式有什么问题呢?正则表达式有一个悲观回溯的问题。 什么是悲观回溯呢? 我们举个例子,假如大家对正则表达式已经很熟悉了。 假如我们使用/^(x*)y$/ 来和字符串xxxxxxy来进行匹配。 匹配之后第一个分组(也就是括号里面的匹配值)是xxxxxx。 如果我们把正则表达式改写为 /^(x*)xy$/ 再来和字符串xxxxxxy来进行匹配。 匹配的结果就是xxxxx。 这个过程是怎么样的呢? 首先(x*)会尽可能的匹配更多的x,知道遇到字符y。 这时候(x*)已经匹配了6个x。 接着正则表达式继续执行(x*)之后的xy,发现不能匹配,这时候(x*)需要从已经匹配的6个x中,吐出一个x,然后重新执行正则表达式中的xy,发现能够匹配,正则表达式结束。 这个过程就是一个回溯的过程。 如果正则表达式写的不好,那么就有可能会出现悲观回溯。 还是上面的例子,但是这次我们用/^(x*)y$/ 来和字符串xxxxxx来进行匹配。 按照上面的流程,我们知道正则表达式需要进行6次回溯,最后匹配失败。 考虑一些极端的情况,可能会导致回溯一个非常大的次数,从而导致CPU占用率飙升。 我们称正则表达式的DOS攻击为REDOS。 举个nodejs中REDOS的例子: app.get('/redos-me', (req, res) =>{let filePath = req.query.filePath; / / REDOS if (filePath.match (/ (\ /. +) + $/)) {console.log ('valid path');} else {console.log (' invalid path');} res.sendStatus;})
In the callback above, we intended to match a path like / a/b/c. But if the user enters filePath=///.../\ n, if there are 100 /, then follow the newline character.
Then it will lead to pessimistic backtracking of regular expressions. Because。 Represents a match for any single character except the newline character\ n. But it was only in the end that we found that there was no match, so there was a REDOS attack.
How to avoid REDOS attacks?
On the one hand, there are some ready-made regular expression modules that we can use directly, such as safe-regex,rxxr2 and node-re2.
On the one hand, you can go to the www.regexlib.com website to find the regular expression rules to be used, these rules are verified, you can reduce your own mistakes in writing regular expressions.
JSON DOS attack
Usually we use JSON.parse and JSON.stringify, two common JSON operations, but the time of these two operations is related to the length of the JSON entered.
For example:
Var obj = {a: 1}; var niter = 20 obj var before, str, pos, res, took;for (var I = 0; I < niter; iTunes +) {obj = {obj1: obj, obj2: obj}; / / Doubles in size each iter} before = process.hrtime (); str = JSON.stringify (obj); took = process.hrtime (before); console.log ('JSON.stringify took' + took); before = process.hrtime (); pos = str.indexOf ('nomatch'); took = process.hrtime (before) Console.log ('Pure indexof took' + took); before = process.hrtime (); res = JSON.parse (str); took = process.hrtime (before); console.log ('JSON.parse took' + took)
In the above example, we parse obj, of course, this obj is relatively simple, if the user passes in a very large json file, it will cause event loop blocking.
The solution is to limit the length of the user's input. Or use asynchronous JSON API: such as JSONStream and Big-Friendly JSON.
Blocking Worker Pool
The idea of nodejs is to use the smallest thread to handle the largest customer connections. We also talked about putting complex operations into Worker Pool to take advantage of thread pools.
However, the number of threads in the thread pool is also limited. If a thread executes a long run task, there is a worker thread missing from the thread pool.
Malicious attackers can actually seize this weakness of the system to carry out DOS attacks.
So the best solution to long run task in Worker Pool is partitioning. So that all tasks have equal opportunities to carry out.
Of course, if you can clearly distinguish between short task and long run task, then we can actually construct different worker Pool to serve different task task types.
At this point, I believe you have a deeper understanding of "Why not block event loop in nodejs". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.