How to optimize Nginx and Node.js for high-load networks 10/31 Update SLTechnology News&Howtos

How to optimize Nginx and Node.js for high-load networks

2025-10-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to optimize Nginx and Node.js for high-load network related knowledge, the content is detailed and easy to understand, simple and fast operation, has a certain reference value, I believe you will gain something after reading this article on how to optimize Nginx and Node.js for high-load network, let's take a look.

Network tuning

If we do not first understand the underlying transmission mechanism of nginx and node.js, and optimize them, it may be futile to tune them in detail. In general, nginx connects the client to the upstream application through tcp socket.

Our system has many thresholds and restrictions on tcp, which are set by kernel parameters. The default values of these parameters are often set for general purposes and can not meet the requirements of high traffic and short life required by web servers.

Some of the parameters available as candidates for tuning tcp are listed here. To make them work, you can put them in the / etc/sysctl.conf file, or put them in a new configuration file, such as / etc/sysctl.d/99-tuning.conf, and then run sysctl-p to have the kernel load them. We use sysctl-cookbook to do this manual work.

It is important to note that the values listed here are safe to use, but it is recommended that you study the meaning of each parameter in order to choose a more appropriate value according to your load, hardware, and usage.

The copy code is as follows:

Net.ipv4.ip_local_port_range='1024 65000'

Net.ipv4.tcp_tw_reuse='1'

Net.ipv4.tcp_fin_timeout='15'

Net.core.netdev_max_backlog='4096'

Net.core.rmem_max='16777216'

Net.core.somaxconn='4096'

Net.core.wmem_max='16777216'

Net.ipv4.tcp_max_syn_backlog='20480'

Net.ipv4.tcp_max_tw_buckets='400000'

Net.ipv4.tcp_no_metrics_save='1'

Net.ipv4.tcp_rmem='4096 87380 16777216'

Net.ipv4.tcp_syn_retries='2'

Net.ipv4.tcp_synack_retries='2'

Net.ipv4.tcp_wmem='4096 65536 16777216'

Vm.min_free_kbytes='65536'

Highlight several of the important ones.

Net.ipv4.ip_local_port_range

In order to serve the downstream client for the upstream application, nginx must open two tcp connections, one to the client and one to the application. When the server receives many connections, the available ports of the system will soon be exhausted. You can increase the range of available ports by modifying the net.ipv4.ip_local_port_range parameter. If you find such an error in / var/log/syslog: "possible syn flooding on port 80. sending cookies", the system cannot find an available port. Increasing the net.ipv4.ip_local_port_range parameter reduces this error.

Net.ipv4.tcp_tw_reuse

When the server needs to switch between a large number of tcp connections, a large number of connections in the time_wait state are generated. Time_wait means that the connection itself is closed, but the resources have not been released. Setting net_ipv4_tcp_tw_reuse to 1 allows the kernel to reclaim connections as much as possible when secure, which is much cheaper than re-establishing new connections.

Net.ipv4.tcp_fin_timeout

This is the minimum time that a connection in the time_wait state must wait before it can be recycled. Making it smaller can speed up recycling.

How to check connection status

Use netstat:

Netstat-tan | awk'{print $6}'| sort | uniq-c

Or use ss:

Ss-s

Nginx

Ss-s

Total: 388 (kernel 541)

Tcp: 47461 (estab 311, closed 47135, orphaned 4, synrecv 0, timewait 47135), ports 33938

Transport total ip ipv6

* 541--

Raw 0 0 0

Udp 13 10 3

Tcp 326 325 1

Inet 339 335 4

Frag 0 0 0

As the load on the web server increases, we begin to encounter some strange limitations of nginx. The connection is discarded and the kernel does not stop reporting syn flood. At this time, the average load and cpu utilization are very low, and the server is clearly able to handle more connections, which is really frustrating.

After investigation, it is found that there are a lot of connections in the time_wait state. This is the output of one of the servers:

There are 47135 time_wait connections! And, as you can see from ss, they are all closed connections. This shows that the server has consumed most of the available ports, and it also implies that the server has assigned a new port to each connection. Tuning the network helps this problem a little bit, but there are still not enough ports.

After further research, I found a document on the uplink keepalive instruction, which read:

Set the maximum number of idle active connections to the upstream server, which are retained in the worker process's cache.

Interesting. In theory, this setting minimizes the waste of connections by passing requests on cached connections. It is also mentioned in the document that we should set proxy_http_version to "1.1" and clear the" connection "header. After further research, I found that this is a good idea, because http/1.1 greatly optimizes the utilization of tcp connections compared to http1.0, while nginx uses http/1.0 by default.

After modifying as recommended in the document, our uplink profile looks like this:

The copy code is as follows:

Upstream backend_nodejs {

Server nodejs-3:5016 max_fails=0 fail_timeout=10s

Server nodejs-4:5016 max_fails=0 fail_timeout=10s

Server nodejs-5:5016 max_fails=0 fail_timeout=10s

Server nodejs-6:5016 max_fails=0 fail_timeout=10s

Keepalive 512

}

I also modified the proxy settings in the server section as recommended. At the same time, a p roxy_next_upstream is added to skip the failed server, adjust the client's keepalive_timeout, and close the access log. The configuration looks like this:

The copy code is as follows:

Server {

Listen 80

Server_name fast.gosquared.com

Client_max_body_size 16m

Keepalive_timeout 10

Location / {

Proxy_next_upstream error timeout http_500 http_502 http_503 http_504

Proxy_set_header connection ""

Proxy_http_version 1.1

Proxy_pass http://backend_nodejs;

}

Access_log off

Error_log / dev/null crit

}

With the new configuration, I found that the socket consumed by the servers was reduced by 90%. Requests can now be transmitted with much fewer connections. The new output is as follows:

Ss-s

Total: 558 (kernel 604)

Tcp: 4675 (estab 485, closed 4183, orphaned 0, synrecv 0, timewait 4183 0), ports 2768

Transport total ip ipv6

* 604--

Raw 0 0 0

Udp 13 10 3

Tcp 492 491 1

Inet 505 501 4

Node.js

Thanks to the event-driven design, iUniverse node.js can handle a large number of connections and requests asynchronously out of the box. Although there are other tuning methods, this article will focus on the process aspects of node.js.

Node is single-threaded and does not automatically use multicore. In other words, the application does not automatically acquire the full capabilities of the server.

Implement clustering of node processes

We can modify the application to fork multiple threads and receive data on the same port so that the load can span multiple cores. Node has a cluster module that provides all the tools necessary to achieve this goal, but it takes a lot of manual work to add them to the application. If you are using express,ebay, there is a module called cluster2 available.

Prevent context switching

When running multiple processes, you should ensure that each cpu core is only busy with one process at a time. Generally speaking, if cpu has n cores, we should generate 1 application process. This ensures that each process gets a reasonable time slice, while the remaining core is left to the kernel scheduler to run other tasks. We also need to make sure that basically no tasks other than node.js are performed on the server to prevent contention for cpu.

We once made a mistake of deploying two node.js applications on the server, and then each application opened a process. As a result, they compete for cpu with each other, resulting in a sharp increase in the load of the system. Although our servers are all 8-core machines, we can still clearly feel the performance overhead caused by context switching. Context switching refers to the phenomenon that cpu suspends the current task in order to perform other tasks. When switching, the kernel must suspend all states of the current process, and then load and execute another process. To solve this problem, we reduced the number of processes started by each application and allowed them to share cpu fairly, resulting in a reduction in system load:

Notice the figure above to see how the system load (blue line) is reduced below the number of cpu cores (red line). We see the same thing on other servers. Since the overall workload remains the same, the performance improvement in the figure above can only be attributed to the reduction in context switching.

In no particular order:

1. When there is a performance problem, if you can calculate and process it at the application layer, take it out of the database layer. Sorting and grouping are typical examples. It is always much easier to improve performance at the application layer than at the database layer. Just like MySQL,sqlite is easier to control.

two。 With regard to parallel computing, if it can be avoided, try to avoid it. If it is unavoidable, remember that the greater the ability, the greater the responsibility. If possible, try to avoid manipulating threads directly. Operate at a higher level of abstraction as much as possible. For example, in iOS, GCD, distribution and queue operations are your best friends. The human brain is not designed to analyze those infinite temporary states-this is my painful lesson.

3. Simplify the state as much as possible and localize as much as possible. Application is supreme.

4. The short and combinable way is your good friend.

5. Code comments are dangerous because they can easily be updated untimely or misleading, but this is no reason not to write comments. Don't comment on trivial things, but if necessary, strategic long notes are needed in some special places. Your memories will betray you, maybe tomorrow morning, maybe after a cup of coffee.

6. If you think that a use case scenario may "not be a problem", it may be the place where you suffer a miserable failure in the release of the product a month later. Be a skeptic, test, verify.

7. When in doubt, communicate with all relevant people on the team.

8. Do the right thing-you usually know what this means.

9. Your users are not stupid, they just don't have the patience to understand your shortcuts.

10. If a developer is not assigned to maintain your system on a long-term basis, be vigilant. 80% of the blood, sweat and tears are shed in the time after the software release-then you will become a world-weary, but also a smarter "expert".

11. The to-do list is your best friend.

twelve。 Take the initiative to make your work more fun, and sometimes it takes effort.

13. Quietly collapse, I will still wake up from the nightmare. Surveillance, logs, alarms. Be aware of all kinds of false alarms and inevitable sensory passivation. Keep your system sensitive to failures and alert in a timely manner.

This is the end of the article on how to optimize Nginx and Node.js for high-load networks. Thank you for reading! I believe you all have a certain understanding of "how to optimize Nginx and Node.js for high-load networks". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.