In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
When I was working, an ngx_lua service on a drainage test machine suddenly had some HTTP/500 responses. Judging from the stack printed by the error log, it was caused by a Lua table added in the newly released version not long ago that did not exist, but was indexed by code. If this is caused by a version rollback, then why is the code that uses the Lua table not rolled back, but the code that defines the table?
After investigation, it was found that nginx had just completed the hot update operation, and the old master process still existed, because to prepare the machine to restart, the drainage traffic was cut off first (but some requests were still there), and the system triggered nginx -s stop, which led to this problem.
scene reproduction
Below I'm going to replicate this process using a native nginx on my fedora26 installed virtual machine, the latest version of nginx I'm using is 1.13.4
Start with nginx.
You can see that both master and worker are already running.
Then we send a SIGUSR2 signal to the master, and when the nginx core receives this signal, it triggers a hot update.
You can see that the new master and the worker from the master fork are already running. At this time, we then send a SIGWINCH signal to the old master. After receiving this signal, the old master will send SIGQUIT to its worker, so the old master's worker process will exit:
Only the old master, the new master, and the new master's worker are left running, similar to what happened online at the time.
Then we use the stop command:
We'll find that the new master and its worker have exited, while the old master is still running and spawning workers. That's what happened online.
In fact, this phenomenon has something to do with nginx's own design: when the old master is ready to fork the new master, it renames the file nginx.pid to nginx.pid.oldbin, and then the new master from the fork creates the new nginx.pid, which will record the pid of the new master. Nginx believes that after the hot update is complete, the old master's mission is almost over, and it will exit at any time after that, so the operation should be taken over by the new master. Of course, attempting to hot-update again by sending SIGUSR2 to the new master without the old master exiting is invalid, the new master simply ignores the signal and continues its work.
problem analysis
Unfortunately, the Lua table we mentioned above, the Lua file that defines it, has been loaded into memory and compiled into bytecode by LuaJIT as early as when running the init_by_lua hook, so obviously the old master must not have this Lua table, because it loads that part of Lua code is an old version.
The Lua code that indexes the table is not used when init_by_lua. These codes are loaded in the worker process. At this time, the code in the project directory is the latest, so the worker process loads the latest code. If these worker processes handle the relevant requests, Lua runtime errors will occur, and the external performance will be the corresponding HTTP 500.
Having learned this lesson, we need to shut down our nginx service more rationally. Therefore, a more reasonable nginx service startup shutdown script is necessary. Some scripts circulating on the Internet do not deal with this phenomenon. We should refer to the scripts provided by NGINX officials.
This code is quoted from NGINX's official/etc/init.d/nginx.
nginx signal set
Next, we will comprehensively sort out the nginx signal set, which will not involve the details of the source code. Interested students can read the relevant source code by themselves.
There are two ways to send signals to the master process, one via nginx -s signal and the other manually via kill command.
The first way is to spawn a new process, which gets the pid of the master process through the nginx.pid file, sends the corresponding signal to the master, and then exits. This process is called signaller.
The second way requires us to understand the mapping of nginx -s signals to real signals. The following table shows their mapping relationship:
operation signal
reload SIGHUP
reopen SIGUSR1
stop SIGTERM
quit SIGQUIT
hot update SIGUSR2 & SIGWINCH & SIGQUIT
stop vs quit
Stop sends a SIGTERM signal to indicate a forced exit, and quit sends a SIGQUIT signal to indicate a graceful exit. The specific difference is that after receiving the SIGQUIT message (note that it is not directly sending a signal, so it is replaced by a message here), the worker process will close the listening socket, close the currently idle connection (the connection that can be preempted), and then handle all timer events in advance, and finally exit. In all cases, quit should be used instead of stop.
reload
After the master process receives SIGHUP, it will re-analyze the configuration file, share the memory request, and a series of other work, and then generate a batch of new worker processes, and finally send the SIGQUIT corresponding message to the old worker process, and finally realize the restart operation seamlessly.
reopen
When the master process receives SIGUSR1, it reopens all the files (e.g. logs) that are already open and sends SIGUSR1 to each worker process, which does the same when it receives the signal. reopen can be used for log cutting, such as NGINX official provides a solution:
Sleep 1 is required here because there is a window of time between the master process sending SIGUSR1 to the worker process and the worker process actually reopening access.log, when the worker process is still writing logs to the file access.log.0. Sleep 1s guarantees the integrity of access.log.0 log information (if you compress it directly without sleep, it is likely that logs will be lost).
hot update
Sometimes we need binary hot updates, nginx is designed to include this feature, but it cannot be done through the command line provided by nginx, we need to send signals manually.
We first need to send SIGUSR2 to the current master process, after which the master will rename nginx.pid to nginx.pid.oldbin, and then fork a new process. The new process will replace the current process image with the new nginx ELF file by executing this system call and become the new master process. After the new master process is up, it will perform configuration file parsing and other operations, and then fork the new worker process to start working.
We then send SIGWINCH to the old master, which then sends SIGQUIT to its worker process, causing the worker process to exit. Sending SIGWINCH and SIGQUIT to the master process causes the worker process to exit, but the former does not cause the master process to exit.
Finally, if we feel that the old master process has completed its mission, we can send it a SIGQUIT signal to quit.
How the worker process handles signal messages from the master
In fact, the master process communicates to the worker process, not using the kill function, but using the nginx channel implemented through the pipeline. The master process writes information (such as signal information) to one end of the pipeline, and the worker process receives information from the other end. The nginx channel event is added to the event scheduler (such as epoll, kqueue) when the worker process just gets up, so when there is data sent from the master, it can be notified by the event scheduler.
Nginx is designed for a reason. As an excellent reverse proxy server, nginx pursues extreme high performance, while the signal handler interrupts the worker process, causing all events to be suspended for a time window, which has a certain loss of performance.
Many people might think that when the master process sends a message to the worker process, the worker process will immediately respond with an action, but the worker process is very busy, it constantly handles network events and timer events, and when the nginx channel event handler is called, nginx only handles some flags. These actions are actually performed after a round of event scheduling is complete. Therefore, there is a time window between this, especially when the business is complex and the traffic is huge, this window may be enlarged, which is why the log cutting scheme provided by NGINX officially requires sleep 1s.
Of course, we can also bypass the master process and send signals directly to the worker process. The signals that the worker can handle are
signal effect
SIGINT forced exit
SIGTERM forced exit
SIGQUIT Elegant Exit
SIGUSR1 Reopen file
summary
Nginx signal operation is the most common and important in daily operation and maintenance. If there is a mistake in this link, it may cause business abnormality and bring losses. So it is necessary to clarify the nginx signal set to help us better handle these tasks.
In addition, through this experience and knowledge of the nginx signal set, we believe that the following points are more important:
Use nginx -s stop sparingly, use nginx -s quit whenever possible
After the hot update, if you are sure that the business is OK, let the old master process exit as much as possible
Wait for a period of time after the critical signal operation is completed to avoid the influence of time window
Do not send signals directly to the worker process
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.