No live upstreams while connecting to upstream example Analysis of online nginx 04/26 Update SLTechnology News&Howtos

No live upstreams while connecting to upstream example Analysis of online nginx

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Online nginx no live upstreams while connecting to upstream example analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

First describe the environment. The load balancer in the front segment is forwarded to the nginx,nginx and then forwarded to the application server at the back end.

The nginx configuration file is as follows:

Upstream ads {

Server ap1:8888 max_fails=1 fail_timeout=60s

Server ap2:8888 max_fails=1 fail_timeout=60s

}

The phenomena are as follows:

A log similar to * 379803415 no live upstreams while connecting to upstream is recorded every minute or two.

In addition, there are a large number of "upstream prematurely closed connection while reading response header from upstream" logs.

Let's first look at the question of "no live upstreams".

It literally means that nginx has found that there is no living backend, but the strange thing is that the access has been normal all this time, and some of what you see with wireshark has come in and some have returned.

Now we can only look at it from the perspective of nginx source code.

Because it is an error related to upstream, look for the keyword "no live upstreams" in ngx_http_upstream.c and you will find the following code (in fact, you will find that this keyword is the only one in this file if you look for it in the nginx global code):

You can see here that when rc equals NGX_BUSY, the "no live upstreams" error is logged.

Looking up at line 1328, you can see that the value of rc is returned by the function ngx_event_connect_peer.

Ngx_event_connect_peer is implemented in event/ngx_event_connect.c. In this function, this is the only place that returns NGX_BUSY, and everything else is NGX_OK or NGX_ERROR or NGX_AGAIN.

Rc = pc- > get (pc, pc- > data)

If (rc! = NGX_OK) {

Return rc

}

Pc here is a pointer to the ngx_peer_connection_t structure, and get is a function pointer to ngx_event_get_peer_pt. Exactly where it points to is not known for the time being. Go on to look at ngx_http_upstream.c

As you can see in ngx_http_upstream_init_main_conf, the code is as follows:

Uscfp = umcf- > upstreams.elts

For (I = 0; I)

< umcf->

Upstreams.nelts; iTunes +) {

Init = uscfp [I]-> peer.init_upstream? Uscfp [I]-> peer.init_upstream:

Ngx_http_upstream_init_round_robin

If (init (cf, uscfp [I])! = NGX_OK) {

Return NGX_CONF_ERROR

}

As you can see here, the default configuration is polling (in fact, each module of load balancer forms a linked list, and each time it is processed from the linked list to the head. From the above to the configuration file, you can see that nginx will not call other modules before polling), and initialize each upstream with ngx_http_upstream_init_round_robin.

Take a look at the ngx_http_upstream_init_round_robin function, which has the following line:

R-> upstream- > peer.get = ngx_http_upstream_get_round_robin_peer

Here you point the get pointer to ngx_http_upstream_get_round_robin_peer.

In ngx_http_upstream_get_round_robin_peer, you can see:

If (peers- > single) {

Peer = & peers- > peer [0]

If (peer- > down) {

Goto failed

}

} else {

/ * there are several peers * /

Peer = ngx_http_upstream_get_peer (rrp)

If (peer = = NULL) {

Goto failed

}

Let's take a look at the part of failed:

Failed:

If (peers- > next) {

/ * ngx_unlock_mutex (peers- > mutex); * /

Ngx_log_debug0 (NGX_LOG_DEBUG_HTTP, pc- > log, 0, "backup servers")

Rrp- > peers = peers- > next

N = (rrp- > peers- > number + (8 * sizeof (uintptr_t)-1))

/ (8 * sizeof (uintptr_t))

For (I = 0; I)

< n; i++) { rrp->

Tried [I] = 0

}

Rc = ngx_http_upstream_get_round_robin_peer (pc, rrp)

If (rc! = NGX_BUSY) {

Return rc

}

/ * ngx_lock_mutex (peers- > mutex); * /

}

/ * all peers failed, mark them as live for quick recovery * /

For (I = 0; I)

< peers->

Number; iTunes +) {

Peers- > peer [I] .fails = 0

}

/ * ngx_unlock_mutex (peers- > mutex); * /

Pc- > name = peers- > name

Return NGX_BUSY

The truth is clear here. If the connection fails, try to connect to the next one. If all fail, quick recovery will be performed to reset the number of failures of each peer to 0, and then a NGX_BUSY will be returned. Then nginx will print a no live upstreams, and finally return to the original state, and then forward it.

This explains why the no live upstreams can still be accessed normally.

If you look at the configuration file again, if one of them fails, nginx will think it is dead, and then it will send all future traffic to the other. When the other one fails, it will think that both are dead, then quick recovery, and print a log.

Another problem caused by this is that if several servers believe that one backend is dead at the same time, it will cause an imbalance in traffic, as can be seen in the screenshot of zabbix monitoring:

Preliminary solution:

By changing max_fails from 1 to 5, the effect is obvious. The probability of "no live upstreams" appears much less, but it does not disappear completely.

In addition, there will be a lot of "upstream prematurely closed connection while reading response header from upstream" in the log.

This time, from the point of view of the source code, this error will be reported when executing the ngx_http_upstream_process_header function, but it is not obvious whether it is due to network or other reasons, so let's grab the package by tcpdump.

54 is the address of the load balancer at the front end of nginx, 171 is the address of nginx, 32 is the address of ap1, and the address of ap2 is 201

As shown in the screenshot:

The request is sent to the nginx by the load balancer. Nginx first responds to the ack to the load balancer, then makes a three-way handshake with ap1, and then sends a packet of length 614 to ap1. However, I received an ack and fin+ack. From Ack=615, we can see that both packets are responses to packets with a length of 614, and the back-end app closes the connection directly!

Then, nginx responds to the backend app with an ack and fin+ack, which can be seen from the Ack=2 as a response to fin+ack.

Then nginx sends a syn packet to ap2 and receives the first ack returned.

The second picture:

As shown in the figure, you can see that after the three-way handshake between nginx and ap2, a request packet was also sent, and the connection was also closed directly.

Nginx then returns 502 to the load balancer.

Once again, the grab package here supports the analysis of the above code from the side.

Then feedback the problem to the colleagues who are working on the back-end application.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.