How to troubleshoot the 502 problem? 07/02 Update SLTechnology News&Howtos

How to troubleshoot the 502 problem?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Just at work, once, the brother who called my service upstream said, your service report "502 error, go and see why."

At that time, there happened to be a call log in that service, which usually recorded all kinds of 200504xx status codes. So I went to the service log to search the number 502 and found nothing. So he said to my brother, "there is no record of 502 in the service log. Are you mistaken?"

Now that I think about it, I'm a little embarrassed.

I don't know how many brothers are the same as me at that time. In this article, let's talk about what the 502 error is.

Let's start with what the status code is.

HTTP status code We usually browse in the browser of a certain treasure and a certain degree, in fact, are front-end web pages.

Generally speaking, the front end does not store much data, and most of the time you need to get the data from the back-end server.

So the connection between the front and back ends needs to be established through the TCP protocol, and then the data is transmitted on the basis of TCP.

While TCP is a data flow-based protocol, when transmitting data, it does not add data boundaries to every message, so directly using naked TCP for data transmission will have a "sticky packet" problem.

Therefore, we need to use a special protocol format to parse the data. So on this basis, the HTTP protocol is designed. For details, you can see "Why there is RPC when there is a HTTP protocol" I wrote before.

For example, I want to see the specific information of an item, which is actually the id of the product passed in the HTTP request sent by the front end, and the HTTP response returned from the back end returns the product price, store name, shipping address, and so on.

Get the product details through id so that on the surface, we are brushing all kinds of web pages, but in fact, there are many HTTP messages in the process of receiving and sending.

Users browse products on the Internet, but the problem arises. All the above mentioned are normal. If there are anomalies, such as the data sent by the front end, it is not a commodity id at all, but a picture. It is impossible for the back-end server to give a normal response, so it is necessary to design a set of HTTP status codes to identify whether the HTTP request response process is normal. Through this, you can influence the behavior of the browser.

For example, if everything is normal, the server returns a 200 status code, and after receiving it, the front end can rest assured to use the response data. However, if the server finds that something sent by the client is abnormal, it responds with a 4xx status code, which means that this is a client error. The xx in 4xx can be subdivided into various codes according to the type of error. For example, the client does not have permission, and the client requests a web page that does not exist at all. Conversely, if there is a problem with the server, the 5xx status code is returned.

The difference between 4xx and 5xx, but here's the problem.

There is something wrong with the server, so if you focus on it, the server may crash directly, so how can it return the status code to you?

Yes, in this case, it is impossible for the server to return the status code to the client. Therefore, in general, the status code of 5xx is not returned by the server to the client.

They are returned by gateways, common gateways, such as nginx.

The role of nginx goes back to the topic of data interaction between the front end and the front end. If the front end has few users, the back end can handle requests easily. However, with more and more users, the back-end servers are limited by resources, and the cpu or memory may be seriously insufficient. At this time, the solution is very simple. Create several more servers, so that the front-end requests can be distributed to several servers, thus improving the processing capacity.

But to achieve this effect, the front end needs to know which servers are in the back end and establish TCP connections with them one by one.

It is not impossible to establish a connection between the front end and multiple servers, but it is troublesome.

But at this time, it would be nice to have a middle tier between them, so that the client only needs to connect with the middle tier, and then establish a connection with the server.

As a result, the middle tier becomes an agent of these servers, where the client finds an agent for everything, just sends out his own request, and then the agent goes to a server to complete the response. The whole process, the client only knows that his request for the help of the agent, but the agent specifically found the server to complete, the client does not know, also do not need to know.

Like this, the way to block out which servers are proxies is called reverse proxies.

Reverse proxy, in turn, shields the proxy way of which clients are specific, which is the so-called forward proxy.

The role of this middle layer is generally played by gateways such as nginx.

In addition, because the performance configurations of the servers behind may be different, some 4-core 8G and some 2-core 4G MagneNginx can add different access weights to them, and multi-point requests with high weights can be forwarded to achieve different load balancing strategies in this way.

After nginx returns the 5xx status code with the middle layer of nginx, the client changes from directly connected server to client directly connected nginx, and then directly connected to server by nginx. From one TCP connection to two TCP connections.

Therefore, when an exception occurs on the server, the TCP connection sent by nginx to the server cannot respond normally. After receiving this information, nginx will return the 5xx error code to the client, that is, the error of 5xx is actually identified by nginx and returned to the client. The server itself will not have the log information of 5xx. That's why there was a scene at the beginning of the article. The upstream received a 502 error from my service, but I couldn't find this information in my service log.

The common cause of 502 is the official explanation of 502 error code in rfc7231

Bad Gateway The 502 (Bad Gateway) status code indicates that the server, while acting as a gateway or proxy, received an invalid response from an inbound server it accessed while attempting to fulfill the request. The 502 (Bad Gateway) status code indicates that when the server acts as a gateway or proxy, it receives an invalid response from the inbound server it accesses when it tries to satisfy the request.

Listen, do people say anything?

For most programming rookies, instead of explaining the problem, it will only raise more question marks. For example, what exactly does the invalid response mentioned above refer to?

Let me explain, it actually means that 502 is actually issued by the gateway agent (nginx), because the gateway agent forwards the client's request to the server, but the server sends an invalid response, and the invalid response here generally refers to the TCP RST message or the FIN message with four waves.

Four waves estimated that you are very familiar with it, so skip it, let's focus on what the RST message is.

What is RST?

We all know that TCP normally disconnects with four waves, which is normally elegant.

But under abnormal circumstances, both the sender and the sender are not necessarily normal, and they may not even be able to wave, so a mechanism is needed to forcibly close the connection.

RST is used in this case and is generally used to close a connection abnormally. It is a flag bit in the TCP packet header. After receiving the packet with this flag bit, the connection will be closed, and the party receiving the RST will see an connection reset or connection refused error in the application layer.

There are generally two common reasons why TCP messages are sent out because of the RST bit of the RST header.

The server is disconnected too early

There is a TCP connection between nginx and the server. When nginx forwards the client request to the server, they will maintain the connection until the server returns the result normally, and then disconnect the connection.

However, if the server disconnects too early and the nginx continues to send messages, the nginx will receive the RST message returned by the server kernel or the FIN message with four waves, forcing the connection on the nginx side to end.

There are two common reasons for disconnecting prematurely.

The first is that the timeout set by the server is too short. No matter which programming language is used, there is generally an off-the-shelf HTTP library, and the server usually has several timeout parameters. For example, there is a write timeout (WriteTimeout) in golang's HTTP service framework. If 2s is set, it means that the server needs to process the request within 2s and write the result to the response. If not, the connection will be disconnected.

For example, your interface processing time is 5s, but your WriteTimeout is only 2s. Before the response is written, the HTTP framework will actively disconnect the connection. At this point, nginx may receive four waving FIN messages (some frameworks may also send RST messages), and then disconnect, so the client will receive a 502 error.

If you encounter this kind of problem, it would be nice to turn up the time of WriteTimeout.

The second reason for the relationship between FIN and 502, and the most common cause of 502 status codes, is the server application process crash.

The server crashes, that is, no process is currently listening on the server port, and when you try to send data to a port that does not exist, the server's linux kernel protocol stack will respond to a RST packet. Similarly, nginx will also give the client a 502 at this time.

This is the most common situation in the development process of RST and 502.

Now most of our servers will restart the dead service, so we need to determine if the service has ever crashed.

If you have monitored the server's cpu or memory, you can see if there has been a sudden cliff drop in the monitoring chart of CPU or memory. If there is, eight to nine hundred, your server-side application has crashed.

In addition to the sudden collapse of cpu, you can also see when the process was last started through the following command.

Ps-o lstart {pid} for example, the id of the process I want to see is 13515, and the command needs to look like this.

# ps-o lstart 13515 STARTEDWed Aug 31 14:28:53 2022 you can see that the last startup time was August 31. If there is a gap between this time and the operating time you remember, the process may have been pulled again after it crashed.

When faced with this problem, the most important thing is to find out the cause of the crash, which can be caused by a variety of reasons, such as writing to an uninitialized memory address, or memory access is out of bounds (the array arr length is obviously only 2, but the code reads arr [3]).

In this case, almost all the programs have code logic problems, and crashes generally leave the code stack. You can troubleshoot the problem according to the stack error report and fix it. For example, the following figure shows the error stack information for golang, as well as for other languages.

A situation in which the stack is not printed.

But in some cases, sometimes there is no stack at all.

For example, a memory leak causes the process to occupy more and more memory, which eventually leads to exceeding the maximum memory limit of the server, triggering OOM (out of memory), and the process is directly dropped by the operating system kill.

More covertly, the operation of actively exiting the process is hidden in the code logic. For example, there is a method in golang log printing called log.Fatalln (). After printing the log, you will also execute os.Exit () to exit the process directly, which is easy for beginners who do not know the source code.

Exit the process by the way after printing if you are clear that your service has not collapsed. Then keep looking down.

The gateway sent the request to a non-existent IP

Nginx proxies multiple servers in the form of configuration. This configuration is typically placed in / etc/ nginx / nginx.conf.

Open it and you may see a message similar to the following.

Upstream xiaobaidebug.top {server 10.14.12.19:9235 weight=2; server 10.14.16.13:8145 weight=5; server 10.14.12.133:9702 weight=8; server 10.14.11.15:7035 weight=10 } the meaning of the above configuration is that if the client accesses the xiaobaidebug.top domain name, nginx will forward the client's request to the following four server ip, with a weight weight next to the ip. The higher the weight, the more times it will be forwarded.

As you can see, nginx has quite a wealth of configuration capabilities. However, it is important to note that these files need to be configured manually. For situations where there are few servers and little change, this is certainly not a problem.

But now is the era of cloud origin, many companies have their own cloud products, services will naturally be on the cloud. In general, every time a service is updated, it is possible to deploy the service to a new machine. And this ip will also change with each release of the service, do you need to manually go to nginx to change the configuration? This is obviously not realistic.

It would be much easier to have the service proactively tell nginx its ip when the service starts, and then nginx generates such a configuration and reloads it.

In order to achieve such a service registration function, many companies will carry out secondary development based on nginx.

But if there is a problem with the registration function of the service, for example, after the service is started, the new service is not registered, but the old service has been destroyed. At this time, nginx will also call the request to the IP of the old service. Since the old service machine no longer has this service, the server kernel will respond to the RST,nginx and reply to the client after receiving the RST.

It is not difficult to troubleshoot this problem if the instance has been terminated but the IP has not been deleted.

At this time, you can check whether the relevant logs are printed on the nginx side, and see if the forwarded IP port meets the expectations.

If it doesn't live up to expectations, you can find colleagues who do this basic component and have a friendly exchange.

The summary HTTP status code is used to represent the status of the response result, where 200 is a normal response, 4xx is a client error, and 5xx is a server error.

Adding nginx between the client and the server can play the role of reverse proxy and load balancing. The client only requests data from the nginx and does not care which server will handle the request.

If the back-end server application crashes, nginx will receive the RST message returned by the server when accessing the server, and then return the 502 error to the client. 502 is not issued by the server application, but by the nginx. Therefore, when 502 occurs, the back-end server probably does not have the relevant 502 log, which can only be seen on the nginx side.

If 502 is found, priority is given to checking whether the server application has crashed and restarted. If so, check whether the crash stack log has been left. If there is no log, check whether oom or other reasons may have caused the process to exit voluntarily. If the process has not crashed, check the nginx log to see if the request has been dialed to an unknown IP port.

This article comes from the official account of Wechat: rookie debug (ID:xiaobaidebug), author: Xiaobai

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.