How to solve the 502 problem encountered when using nginx's ingress 07/09 Update SLTechnology News&Howtos

How to solve the 502 problem encountered when using nginx's ingress

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you the relevant knowledge points about how to solve the 502 problem encountered when using nginx's ingress. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article.

Enable keep-alive,502 response increase

There is a bug in versions prior to nginx-ingress-controller 0.20. Instead of clearing "Connection: close" in the request when keep-alive is enabled, the configuration template nginx.tmpl adds the request header to all http requests forwarded to upstream:

Image

It should be

Image

Because of the existence of this bug, the http communication between keep-alive,nginx and pod is still a short connection even if it is configured in config-map before 0.20, and a connection is established every time a request is made. Connection utilization is inefficient, and a large number of connections are in the time-wait state, wasting a limited number of local ports.

When we found the problem, we fixed it in time, and then the problem came.

The request log of nginx shows that after the keep-alive takes effect, the number of 502responses increases, and most business systems are insensitive to this, but some systems frequently alarm due to the 502responses, and these systems do not encounter 502responses before the keep-alive takes effect.

The investigation begins.

502 response proportion is very small, and the timing is unknown, at first there is no clue. By flipping through the nginx log and analyzing some messages crawled, it is found that when 502 response occurs, nginx immediately receives the RST message after forwarding the request to upstream: the connection is disconnected by pod.

Keep-alive is specified in the request forwarded by Nginx to Pod, and the connection lasts for a long time. There have been many requests back and forth, so why is the connection suddenly disconnected by Pod?

After all kinds of wishful thinking, the target is the business system running in Pod. The business system running in Pod is a tomcat service, and tomcat itself is a request broker software.

In fact, the request initiated by the client goes through two proxies, first by nginx to tomcat in Pod, and second by tomcat to the process that processes the request.

When you look through tomcat's configuration manual, you find that tomcat has several configurations similar to nginx, which specify the idle timeout for persistent connections (keepAliveTimeout) and the maximum number of requests in persistent connections (maxKeepAliveRequest).

Image

Coincidentally, the default values for these two configuration items are the same as those for similar configurations in nginx, disconnecting after the connection is idle for 60 seconds, and disconnecting after the number of requests in the connection reaches 100. So the question is, who will disconnect nginx or tomcat first?

Judging from the captured messages, sometimes the tomcat was disconnected first, and the nginx still forwarded the request with hindsight. As a result, it received the RST gift package and then responded to 502. The 502 error code means that unexpected data has been returned upstream.

Then we analyzed another python service with the same problem, which does not listen directly to the port to handle requests, but uses Gunicorn proxies. Check the default configuration of Gunicorn, even more exaggerated, its connection idle timeout is 2 seconds!

When analyzing the messages of the python service, I still wonder why its connection is disconnected so quickly. For a time, we doubted the previous hypothesis until we found that its default timeout was 2 seconds: the message showed that the connection was actively disconnected by the Pod side after 2 seconds of idle.

Image

To prevent Pod from disconnecting first, it is simple to make the relevant configuration in nginx smaller than the relevant configuration of the agent software in Pod. For example, if the maxKeepAliveRequest of tomcat is 100, then configure 99 in nginx to ensure that the connection is actively disconnected by nginx.

Here is only a rough description of the original, check the survey records at that time, and click "read the original text".

In addition, there is a problem with 504, which is actually caused by the misconfiguration of kube-apiserver, which causes the endpoints not to be updated in time, which in turn causes the configuration file of nginx not to be updated.

The IP address in Nginx's upstream configuration no longer exists or has been assigned to another Pod, causing the client to receive a response of 504 or a response from another service.

These are all the contents of this article entitled "how to solve the 502 problems encountered when using nginx's ingress". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.