Length of listening queue in PG database 07/16 Update SLTechnology News&Howtos

Length of listening queue in PG database

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Both mysql and pg databases communicate by listening on an ip/ port or a socket.

A problem is involved here, that is, the length of the monitoring queue.

Mysql is implemented on its own, and there is a configuration option in my.cnf, back_log, which sets the length of the listening queue.

There seems to be no place to set the length of the listening queue for the PG database.

This problem seems to occur when doing a high concurrency stress test for pgbench.

Command:

Pgbench-n-r-c 250-j 250-T 2-f update_smallrange.sql

Error message:

Connection to database "" failed:

Could not connect to server: Resource temporarily unavailable

Is the server running locally and accepting

Connections on Unix domain socket "/ tmp/.s.PGSQL.5432"?

But from the "Resource temporarily unavailable" above, I can't tell which resource is wrong.

After investigation, I found the following link

Http://www.postgresql.org/message-id/20130617141622.GH5875@alap2.anarazel.de

[code]

From:Andres Freund To:pgsql-hackers (at) postgresql (dot) orgSubject:PQConnectPoll, connect (2), EWOULDBLOCK and somaxconnDate:2013-06-1714: 16:22Message-ID:20130617141622.GH5875@alap2.anarazel.de (view raw Whole thread or download thread mbox) Thread: 2013-06-17 14:16:22 from Andres Freund 2013-06-26 11:22:58 from Andres Freund 2013-06-26 16:07:54 from Tom Lane 2013-06-26 18:12:00 from Andres Freund 2013-06-27 00:07:40 from Tom Lane 2013-06-27 06:17:57 from Andres Freund 2013-06-27 13:48:25 from Tom Lane 2013-06-27 16 whole thread or download thread mbox 47 from Tom Lane Lists:pgsql-hackersHi

When postgres on linux receives connection on a high rate client

Connections sometimes error out with:

Could not send data to server: Transport endpoint is not connected

Could not send startup packet: Transport endpoint is not connected

To reproduce start something like on a server with sufficiently high

Max_connections:

Pgbench-h / tmp-p 5440-T 10-c 400-j 400-n-f / tmp/simplequery.sql

Now that's strange since that error should happen at connect (2) time

Not when sending the startup packet. Some investigation led me to

Fe-secure.c's PQConnectPoll:

If (connect (conn- > sock, addr_cur- > ai_addr)

Addr_cur- > ai_addrlen)

< 0) { if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == EINTR || SOCK_ERRNO == 0) { /* * This is fine - we're in non-blocking mode, and * the connection is in progress. Tell caller to * wait for write-ready on socket. */ conn->

Status = CONNECTION_STARTED

Return PGRES_POLLING_WRITING

}

/ * otherwise, trouble * /

}

So, we're accepting EWOULDBLOCK as a valid return value for

Connect (2). Which it isn't. EAGAIN in contrast is on some BSDs and on

Linux. Unfortunately POSIX allows those two to share the same value...

My manpage tells me:

EAGAIN No more free local ports or insufficient entries in the routing cache. For

AF_INET see the description of

/ proc/sys/net/ipv4/ip_local_port_range ip (7)

For information on how to increase the number of local

Ports.

So, the problem is that we took a failed connection as having been

Initially successfull but in progress.

Not accepting EWOULDBLOCK in the above if () results in:

Could not connect to server: Resource temporarily unavailable

Is the server running locally and accepting

Connections on Unix domain socket "/ tmp/.s.PGSQL.5440"?

Which makes more sense.

Trivial patch attached.

Now, the question is why we cannot complete connections on unix sockets?

Some code reading reading shows net/unix/af_unix.c:unix_stream_connect ()

Shows:

If (unix_recvq_full (other)) {

Err =-EAGAIN

If (! timeo)

Goto out_unlock

So, if we're in nonblocking mode-which we are-and the receive queue

Is full we return EAGAIN. The receive queue for unix sockets is defined

Static inline int unix_recvq_full (struct sock const * sk)

{

Return skb_queue_len (& sk- > sk_receive_queue) > sk- > sk_max_ack_backlog

}

Where sk_max_ack_backlog is whatever has been passed to the

Listen (backlog) on the listening side.

Question: But postgres does listen (fd, MaxBackends * 2), how can that be

A problem?

Answer:

If the backlog argument is greater than the value in / proc/sys/net/core/somaxconn

Then it is silently truncated to that value; the default value in this file is

one hundred and twenty eight。 In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with

The value 128.

Setting somaxconn to something higher indeed makes the problem go away.

I'd guess that pretty much the same holds true for tcp connections

Although I didn't verify that which would explain some previous reports

On the lists.

TLDR: Increase / proc/sys/net/core/somaxconn

Greetings

Andres Freund

Andres Freund http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

[/ code]

It turns out that the listen backlog on the PG server (limited by the kernel parameter somaxconn) is not enough. The default value of somaxconn is 128. when it is turned up, restart PG and test OK.

/ proc/sys/net/core/somaxconn

This file defines a ceiling value for the backlog argument of listen (2); see the listen (2) manual page

For details.

The solution is clear here.

Echo 256 > / proc/sys/net/core/somaxconn

Then restart pg to continue on ok.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.