How to keep the speed of Node.js 07/19 Update SLTechnology News&Howtos

How to keep the speed of Node.js

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to maintain the speed of Node.js". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to maintain the speed of Node.js".

Introduction

If you have been using Node.js long enough, there is no doubt that you will encounter a more painful speed problem. JavaScript is an event-driven, asynchronous language. This obviously makes reasoning about performance tricky. The rapid popularity of Node.js makes it necessary for us to find tools and technologies suitable for this kind of server-side javacscript.

When we encounter performance problems, the experience on the browser side will not apply to the server side. So how do we make sure that an Node.js code is fast and meets our requirements? Let's take a look at some examples.

Tools

We need a tool to stress-test our server to measure performance. For example, we use autocannon

Npm install-g autocannon / / or use Taobao Source cnpm, Tencent Source tnpm

Other Http benchmarking tools include Apache Bench (ab) and wrk2, but AutoCannon is written in Node, which is more convenient and easy to install for the front end. It can be easily installed in Windows, Linux and Mac OS X.

When we install the benchmark performance testing tool, we need to use some methods to diagnose our program. A good tool for diagnosing performance problems is Node Clinic. It can also be installed with npm:

Npm install-g clinic

This will actually install a series of packages, and we will use Clinic Doctor

And Clinic Flame (an ox package)

Translator's note: ox is a tool that automatically parses cpu and generates node process flame diagrams; clinic Flame is based on ox encapsulation.

On the other hand, the clinic tool itself is actually a combination of a series of suites, and its different subcommands are called to different submodules, such as:

Doctor's diagnostic function. The doctor functionality is provided by Clinic.js Doctor.

Bubble diagnostic function. The bubbleprof functionality is provided by Clinic.js Bubbleprof.

Flame chart function. The flame functionality is provided by Clinic.js Flame.)

Tips: for this example, Node 8.11.2 or later is required

Code example

Our example is a simple REST server with only one resource: expose a route / seed/v1 for GET access and return a large JSON payload. The code on the server side is an app directory that includes a packkage.json (dependent on restify 7.1.0), an index.js and a util.js.

/ / index.js const restify = require ('restify') const server = restify.createServer () const {etagger, timestamp, fetchContent} from'. / util' server.use (etagger.bind (server)) / / bind etagger middleware You can add the etag response header server.get ('/ seed/v1', function () {fetchContent (req.url, (err, content) = > {if (err) {return next (err)} res.send ({data: content, ts: timestamp (), url: req.url}) next ()} to the resource request) server.listen (8080) Function () {cosnole.log ('% s listening at% slots, server.name, server.url)}) / / util.js const restify = require ('restify') const crypto = require (' crypto') module.exports = function () {const content = crypto.rng ('5000'). ToString (' hex') / / ordinary regular random const fetchContent = function (url) Cb) {setImmediate (function () {if (url! = ='/ seed/v1') return restify.errors.NotFoundError ('no apiece') Cb (content)} let last = Date.now () const TIME_ONE_MINUTE = 60000 const timestamp = function () {const now = Date.now () if (now-last > = TIME_ONE_MINITE) {last = now} return last} const etagger = function () {const cache = {} let afterEventAttached = false function attachAfterEvent (server) {if (attachAfterEvent) return afterEventAttached = true server.on ('after' Function (req Res) {if (res.statusCode = = 200 & & res._body! = null) {const urlKey = crpto.createHash ('sha512') .update (req.url) .digets () .toString (' hex') Const contentHash = crypto.createHash ('sha512') .update (JSON.stringify (res._body)) .digest () .toString (' hex') if (cache [urlKey]! = contentHash) cache [urlKey] = contentHash}}) } return function (req Res, next) {/ / translator's note: the position of attachEvent here does not seem to be very elegant. I changed this place in another way. Please refer to: https://github.com/cuiyongjian/study-restify/tree/master/app attachAfterEvent (this) / / register an after hook for server Every time you are about to respond to the data, calculate the body value const urlKey = crypto.createHash ('sha512') .update (req.url) .digest () .toString (' hex') / / translator's note: there should be a slight problem with the return logic of etag. Each request is returned from the etag if (urlKey in cache) res.set ('Etag', cache [urlKey]) res.set (' Cache-Control', 'public) that was last written to cache Max-age=120')}} return {fetchContent, timestamp, etagger}}

It is important not to use this code as a practice, as there is a lot of bad smell in it, but we will measure and find out these problems next.

To get the source code for this example, you can go here

Profiling analysis

To parse our code, we need two terminal windows. One is used to start app, and the other is used to stress test him.

* terminal, we execute:

Node. / index.js

Another terminal, we analyze it like this:

Autocannon-c100 localhost:3000/seed/v1

This will open 100 concurrent request bombardment service for 10 seconds.

The result is something like this:

StatavgstdevMax time (milliseconds) 3086.811725.25554 Throughput (requests / second) 23.119.1865 transfers per second (bytes / second) 237.98 kB197.7 kB688.13 kB

231 requests in 10s, 2.4 MB read

The result will change according to your machine condition. However, we know that the average "Hello World" Node.js server can easily complete 30, 000 requests per second on the same machine, and now this code can only handle 23 requests per second with an average delay of more than 3 seconds, which is frustrating.

Translator's note: I use the company's macpro18 15-inch 16G 256g, and the test results are as follows:

Diagnosis

Positioning problem

We can diagnose the application with a command, thanks to clinic doctor's-on-port command. Under the app directory, we execute:

Clinic doctor-- on-port='autocannon-C100 localhost:3000/seed/v1'-- node index.js

Translator's note:

Now you can use the new command syntax in the form of subarg for autocannon:

Clinic doctor-- autocannon [/ seed/v1-c 100]-- node index.js

Clinic doctor creates a html file and automatically opens the browser after parsing.

The result looks like this:

The translator's test looks like this:-

Translator's note: Abscissa is actually your system time, and the number of seconds after the colon indicates the current system time.

Note: the following article content analysis, we still take the original statistical result picture as the basis.

Following the message at the top of UI, we see the EventLoop chart, which is indeed red, and the EventLoop latency continues to grow. Before we delve into what he means, let's take a look at the diagnosis under other indicators.

We can see that CPU has been hovering around 100% or more, as the process is trying to process queued requests. Node's JavaScript engine (that is, V8) actually works with two CPU cores here, because the machine is multicore and V8 uses two threads. One thread is used to perform EventLoop, and the other thread is used for garbage collection. When the CPU is as high as 120%, the process is reclaiming the legacy object of the request processed. (note: the process CPU utilization of the operating system does often exceed 100%. This is because multithreading is used in the process, and OS distributes work to multiple cores, so it will exceed 100% when counting cpu time.)

Let's look at the memory chart associated with it. The solid line indicates the heap memory usage of the memory. RSS represents the actual memory consumed by the node process, and the heapUsage heap memory footprint refers to how much heap area is occupied, and THA represents the total heap memory applied. HeapUsage is generally fine, because it represents the memory occupied by most JavaScript objects in node code. We see that as long as the CPU chart goes up, the heap memory footprint decreases a bit, indicating that memory is being recycled.

There is no correlation between activeHandler and EventLoop latency. An active hanlder is an object (such as a socket or a file handle) or a timer (such as a setInterval) that expresses Ibind O. We created a request for 100 connections with autocannon (- c100), and the activehandlers remained at 103. 0. The additional three handler are actually STDOUT,STDERROR and the server object itself (server itself is also a socket listener handle).

If we click on the suggested pannel panel at the bottom of the UI interface, we will see:

Short-term relief

It takes a lot of time to analyze performance issues in depth. In an existing network project, you can add overload protection to the server or service. The idea of overload protection is to detect EventLoop latency (and other metrics) and then respond to a "503 Service Unavailable" when the threshold is exceeded. This allows the load balancer to move to another server instance, or just let the user try again later. The overload protection module overload-protection-module can be directly and cheaply connected to Express, Koa and Restify. The Hapi framework also has a configuration item that provides the same overload protection. (translator's note: in fact, the bottom layer of the overload-protection module is EventLoop delayed sampling implemented through loopbench, and loopbench is a module extracted from the Hapi framework; as for memory consumption, it is self-implemented sampling within overload-protection, after all, it is fine to use memoryUsage's api directly)

Understand the problem.

As Clinic Doctor said, if EventLoop is delayed to the way we observe, it is likely that one or more functions block the event loop. It is important to recognize that this main feature of Node.js is that asynchronous events cannot be executed until the current synchronous code is completed. This is why the following setTimeout cannot be triggered as expected.

For example, execute it in the browser developer tool or in the REPL of Node.js:

Console.time ('timeout') setTimeout (console.timeEnd, 100,' timeout') let n = 1e7 while (nMel -) Math.random ()

This print time will never be 100ms. It will be a number between 150ms and 250ms. SetTimeoiut dispatches an asynchronous operation (console.timeEnd), but the currently executed code is not completed; here are two additional lines of code to make a loop. The currently executed code is often called "Tick". To complete this Tick,Math.random, it takes 1000 calls. If this costs 100ms, then the total time when timeout is triggered is 200ms (plus the delay when the setTimeout function is actually pushed into the queue, about a few milliseconds)

Translator's note: actually, there is a slight problem with the author's explanation here. First of all, in this example, if the loop takes 100ms according to him, then the setTimeout triggers only 100ms, not the sum of two times. Because the 100ms loop ends, the setTimeout is about to be triggered.

In addition: when you actually test on a computer, you are likely to get a little more 100ms like me, rather than the author's 150,250. The author got 150ms because of the computer performance it uses so that the time taken by the while loop is from 150ms to 250ms. Once a better computer calculates the 1e7 cycle in only a few milliseconds, it will not block the setTimeout after 100ms at all, and the result is often about 103ms, where 3ms is the time it takes for the underlying function to queue up and call (nothing to do with the problem here). Therefore, you can try to change 1e7 to 1e8 during the test. Anyway, let his execution time exceed 100 milliseconds.

In a server-side context, if an operation takes a long time in the current Tick, the request cannot be processed and the data cannot be obtained, because asynchronous code cannot be executed until the current Tick is complete. This means that calculating expensive code will slow down all interactions with server. Therefore, it is recommended that you split a resource-sensitive task into a separate process and then call it from the main main server, which can prevent rarely used but resource-sensitive routes from slowing down the performance of frequently accessed but resource-insensitive routes.

There is a lot of code in the example server in this article that blocks the event loop, so the next step is to locate the exact location of the code.

Analysis.

One way to locate code for performance problems is to create and analyze "flame diagrams". A flame diagram expresses functions as blocks superimposed on each other-not over time, but as aggregates. The flame chart is called because it is represented by levels from orange to red, and the redder block indicates that it is a "hot" function, meaning that it is likely to block the event loop. To get the data of the flame map, you need to sample the CPU-- that is, a snapshot of the function currently executed in the node and its stack. The heat is determined by the percentage of time a function takes to execute at the top of the stack during the analysis. If it is not the function that is called in the current stack, it is likely to block the event loop.

Let's use clinic flame to generate a flame diagram of the sample code:

Clinic flame-- on-port='autocannon-C100 localhost:$PORT/seed/v1'-- node index.js

Translator's note: you can also use the new command style:

Clinic flame-- autocannon [/ seed/v1-c200-d 10]-- node index.js

The results are automatically displayed in your browser:

Translator's note: the new version has become like this, more powerful, but you may have to learn how to look at it.

(translator's note: the following analysis is still based on the picture of the original text)

The width of the block indicates how much CPU time it took. You can see that the three main stacks took most of the time, and server.on is the most popular one. In fact, the three stacks are the same. They are separated because optimized and unoptimized functions are treated as different calling frames during the analysis. Functions with the * prefix are optimized by the JavaScript engine, while those with the ~ prefix are unoptimized. If it doesn't matter whether or not to optimize our analysis, we can click the Merge button to merge them. The image will look like this:

From the beginning, we can find that the problem code is in util.js. This slow function is also an event handler: the function is triggered by the events module in the Node core, and server.on is a backup name for the event handler anonymous function. We can see that this code is not in the same Tick as the code that actually handled the request request. If you are in the same Tick as request processing, there should be Node http module, net module and stream module in that stack.

If you expand other smaller blocks, you will see these Node core functions of Http. For example, try the search in the upper right corner and search for the keyword send (both restify and http internal methods have send methods). Then you can find them on the right side of the flame map (the functions are sorted alphabetically).

You can see that the actual HTTP processing blocks take relatively little time.

We can click on a highlighted cyan block to expand and see the writeHead and write functions of the http_outgoing.js file (part of the Node core http library)

We can click all stack to return to the main view.

The key point here is that although the server.on function is not in the same Tick as the actual request processing code, it can still affect server performance by delaying other executing code.

Debuging debugging

We now know from the flame diagram that the problem function is in the server.on eventHandler of util.js. Let's take a look:

Server.on ('after', (req) Res) = > {if (res.statusCode! = = 200) return if (! res._body) return const key = crypto.createHash ('sha512') .update (req.url) .digest () .toString (' hex') const etag = crypto.createHash ('sha512') .update (JSON.stringify (res._body)) .digest () .toStr ing (' hex') if (cache [key] ! = = etag) cache [key] = etag})

As we all know, encryption is an expensive cpu-intensive task, as well as JSON.stringify, but why can't you see it in the flame map? In fact, they have been recorded during the sampling process, but they are hidden in the cpp filter. (translator's note: cpp is C++ type code). We can see something like this when we click the cpp button:

The internal V8 instructions related to serialization and encryption are displayed as the hottest area stack and take the most time. The JSON.stringify method calls the C++ code directly, which is why we can't see the JavaScript function. In the case of encryption, functions such as createHash and update are in the data, and they are either inline (merge and disappear in merge view) or take too little time to display.

Once we start reasoning about the code in the etagger function, we will soon find that it is poorly designed. Why should we get the server instance from the function context? Are all these hash calculations necessary? There is also no If-None-Match header support in real scenarios, which will lighten some of the load in some real scenarios if if-none-match is used, because the client will issue a header request to determine the freshness of the resource.

Let's ignore all of these issues and verify that the code in server.on is the cause of the problem. We can turn the code in server.on into an empty function and generate a new flame diagram.

Now the etagger function looks like this:

Function etagger () {var cache = {} var afterEventAttached = false function attachAfterEvent (server) {if (attachAfterEvent = true) return afterEventAttached = true server.on ('after', (req, res) = > {})} return function (req, res Next) {attachAfterEvent (this) const key = crypto.createHash ('sha512') .update (req.url) .digest () .toString (' hex') if (key in cache) res.set ('Etag', cache [key]) res.set (' Cache-Control', 'public, max-age=120') next ()}}

Now server.on 's event listener function is an empty function no-op. Let's execute clinic flame again:

Clinic flame-- on-port='autocannon-C100 localhost:$PORT/seed/v1'-- node index.js Copy

The following flame diagram is generated:

This looks better, and we will see an increase in throughput per second. But why is event emit's code so popular? What we expect is that HTTP processing takes up the most CPU time at this time, after all, nothing is done in server.on.

This type of bottleneck is usually due to a function call that exceeds a certain degree of expectation.

The suspicious code at the top of util.js may be a clue:

Require ('events'). DefaultMaxListeners = Infinity

Let's remove this code and launch our application with the-- trace-warnings flag tag.

Node-trace-warnings index.js

If we perform a stress test in the next teminal:

Autocannon-c100 localhost:3000/seed/v1

You will see some output from our process:

(node:96371) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 after listeners added. Use emitter.setMaxListeners () to increase limit at _ addListener (events.js:280:19) at Server.addListener (events.js:297:10) at attachAfterEvent (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/util.js:22:14) at Server. (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/util.js:25:7) at call (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:164:9) at next (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:120:9) at Chain.run (/ Users) / davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:123:5) at Server._runUse (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:976:19) at Server._runRoute (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:918:10) At Server._afterPre (/ Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:888:10)

Node tells us that too many events have been added to the server object. This is strange because we have a judgment that if the after event is already bound to server, then return directly. So after binding, only one no-op function is bound to server.

Let's take a look at the attachAfterEvent function:

Var afterEventAttached = false function attachAfterEvent (server) {if (attachAfterEvent = true) return afterEventAttached = true server.on ('after', (req, res) = > {})}

We found that the conditional check statement was written incorrectly! It should not be attachAfterEvent, but afterEventAttached. This means that each request adds an event listener to the server object, and then all previously bound events for each request are triggered. Oh, my God!

Optimize

Now that we know what the problem is, let's see how to make our server faster

Low-end optimization (fruit that is easy to pick)

Let's restore the server.on code (don't let it be an empty function) and change it to the correct boolean judgment in the conditional statement. Now our etagger function looks like this:

Function etagger () {var cache = {} var afterEventAttached = false function attachAfterEvent (server) {if (afterEventAttached = true) return afterEventAttached = true server.on ('after', (req) Res) = > {if (res.statusCode! = = 200) return if (! res._body) return const key = crypto.createHash ('sha512') .update (req.url) .digest () .toString (' hex') const etag = crypto.createHash ('sha512') .update (JSON.stringify (res._body)) .digest () .toString ('hex') if (cache [key]! = = etag) cache [key] = etag})} return function (req Res, next) {attachAfterEvent (this) const key = crypto.createHash ('sha512') .update (req.url) .digest () .toString (' hex') if (key in cache) res.set ('Etag', cache [key]) res.set (' Cache-Control', 'public, max-age=120') next ()}}

Now, let's do the Profile (process profiling, process description) again.

Node index.js

Then use autocanno to profile it:

Autocannon-c100 localhost:3000/seed/v1

We see a 200-fold improvement (100 concurrency lasting 10 seconds).

It is also important to balance development costs with potential server costs. We need to define how far we want to go in optimization. Otherwise, it is easy for us to devote 80% of our time to 20% performance improvement. Is the project affordable?

In some scenarios, it is considered reasonable to use low-end optimization to increase the speed by 200 times a day. In some cases, we may want to do whatever it takes to make our project as fast as possible. This choice depends on the project priority.

One way to control resource spending is to set goals. For example, increase 10 times, or reach 4000 requests per second. This approach based on business requirements makes the most sense. For example, if the server cost exceeds the budget by 100%, we can set a goal of twice the improvement.

further more

If we make another fire picture, we will see:

The event listener is still a bottleneck, and it still takes up the CPU time of 1max 3 (its width is about 1/3 of the width of the entire line).

(translator's note: you may have to think like this every time before making an optimization:) what additional benefits can we get from optimization, and whether these changes (including associated code refactoring) are worth it?

Let's take a look at the performance characteristics that can be achieved after the final * optimization (lasting for 10 seconds http://localhost:3000/seed/v1-- 100 concurrent connections).

92k requests in 11s, 937.22 MB read [15]

Although the 1.6-fold performance improvement after optimization is already significant, the need for effort, change, and code refactoring is also debatable. Especially compared with the previous performance improvement of 200 times by simply fixing a bug.

In order to achieve in-depth improvement, we need to use the same techniques such as profile analysis, flame generation, analysis, debug, optimization. * the optimized server code can be viewed here.

* * to increase the throughput to 800gams, the following methods are used:

Instead of creating an object and then serializing it, create it as a string when you create it.

Instead of creating a hash, identify the etag with something else unique.

Don't hash url, just use url as key.

These changes are slightly more complex, slightly more damaging to the code base, and make etagger middleware slightly less flexible because it puts a burden on routing to provide Etag values. But it can add an additional 3000 requests per second on the machine that executes Profile.

Let's take a look at the final optimized flame diagram:

The hottest spot in the figure is the net module of Node core (node core). This is the most desirable situation.

Prevent performance problems

For *, here are some suggestions to prevent performance problems before deployment.

Using performance tools as informal checkpoints during development can avoid bringing performance problems into the production environment. AutoCannon and Clinic (or other similar tools) are recommended as part of daily development tools.

When buying or using a framework, look at its performance policy. If the framework does not indicate performance-related, see if it is consistent with your infrastructure and business goals. For example, Restify has made it clear (since the release of version 7) that it is committed to improving sex. However, if low cost and high speed are your absolute priorities, consider increasing the speed measured by Fastify,Restify contributors by 17%.

Be careful when choosing some popular class libraries-especially pay attention to logs. When developers fix issue, they may add some log output to their code to help them with debug problems in the future. If she uses a poorly performing logger component, it may stifle performance over time like a warm-boiled frog. The pino logging component is the fastest JSON newline logging component available in Node.js.

* always remember that Event Loop is a shared resource. The performance of the Node.js server is constrained by the slowest logic in the hottest path.

Thank you for your reading, the above is the content of "how to maintain the speed of Node.js", after the study of this article, I believe you have a deeper understanding of how to maintain the speed of Node.js, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.