Why does PostgreSQL need connection pooling to accept a large number of connections to the database? 07/13 Update SLTechnology News&Howtos

Why does PostgreSQL need connection pooling to accept a large number of connections to the database?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail why PostgreSQL needs connection pooling to accept a large number of connections to the database, and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

PostgreSQL is a very good open source database, for the important task of replacing ORACLE database, basically most small and medium-sized enterprises can only count on POSTGRESQL, of course, if you are willing to spend more before, more application structure transformation, MYSQL is not impossible, ORACLE to PG is like, you change from a single room to another room, if it is ORACLE to MYSQL, just like you moved from Beijing to Shanghai. So if you don't want to make a big fight and don't want to change the existing overall architecture, PG must be the inevitable choice, nothing else.

When using PG, you may soon realize the beauty and powerful function of PG. Let's not say much here. What we want to talk about today is the impact of ultra-high connections on PG under high concurrency in POSTGRESQL, and why PG needs to use pgbouncer or pgpool in highly concurrent connections.

First of all, we have to work out the principle, where does the memory allocated by connections come from? most people, including me, think that the main reason why PG cannot accept a large number of connections is actually memory. Because a large number of connections use a lot of memory, PG accepting a large number of connections can lead to OOM, or poor performance.

But in fact, we did a test. I loaded 3000 concurrent connections and queried a table on a PG with 8 gigabytes of memory, while adjusting shared_buffers to 20MB, and then I waited for PG to crash.

In fact, I didn't get what I wanted. PG is still running steadily, but the system is a little slow and a little stuck.

Memory aspect

It will not be completely used up as I expected.

So the question is, what is the reason why PG is not suitable for multiple connections mentioned in various sizes?

That's from PGPROC in the source code of PG.

The above each backend has a PGPROC struct in shared memory, the sentence after +, should indicate that there is a pgproc structure between backend and shared memory. The main function of this structure is reuse.

The twophase.c of the later NOTE proves the reuse of the PGPROC structure, because the current transaction has two states in the queue, real running and ready to run.

This PGPROC will be called in PROC_HDR, and allProcs is a pointer, that is, all PGPROC will be in it. The main function of PGPROC is to handle the switching between related tasks such as waiting and processing during transaction processing, the main functions of PROC, inter-process coordination and communication, and postmaster tracking.

In order to obtain the temporary data of share_buffer and backend, he will traverse to other process, and if we establish more backend, that is, the more connections to PG, it will cause the work of traversing GetSnapshotData to consume more system resources, such as CPU.

Since the query is the simplest select statement and should also be applied to the cache, IO performance is basically unused

The use of memory is not full either.

Multiple connections do not bring PG to OOM and system unresponsive through memory consumption, but with the increase of backend, the cost of internal communication is too high, resulting in performance problems, so PG needs to use buffer pools such as PGPOOL or pgbouncer to ensure system performance in multiple connections. Another problem is why there are so many connections, this is a problem.

Now that we know that PG will have a performance problem in dealing with too many connections, how to solve this problem is of relevant significance to most users. We can ask a few questions with this question.

The use of memory is not full either.

1 Why should there be so many concurrent connections? for example, a database has to withstand more than 3000 + connections, even if it is an Internet attribute, what is the overall architecture design? if there are many concurrent connections, the database itself may be divided into databases and tables, or access may be dispersed through business segmentation. So too many visits at the same time is a problem in itself.

2 for database access, there is a connection pool even if the PGbouncer or pgpool program is not used. The design of the connection should be considered at the beginning of the overall program design, rather than finally letting the database undertake all this.

3 for any database connection, it is not 100% to reach the maximum number of processes at the same time, just in time is MYSQL 3000 MAX CONNECTIONS connection, on a very subdivided time scale, there are only dozens of people accessing the database at the same time. The connection status of PG is divided into

1 active

2 idle

3 idle in transacton

4 aborted

What exactly are PGbouncer and PGPOOL doing to help PG connections here?

1 and 3 PG 4 are not our concern, but the state of idle, which is also the key point where most connections are wasted, because the connection pool of the program has to maintain the state of a connection to the database, which leads to some moments when the state of most of the connections in PG is in the database, so it is necessary to make higher use of connections, let the database use limited connections, and connect more connections to work. The contradiction between fewer connections and more connections for applications is the so-called connection multiplexing.

2 in addition, if you often find your connection status in idle in transaction, it also shows that there are often big transactions waiting for something for a long time, which is also a point to solve the problem, why a transaction has to occupy the connection for a long time and wait.

In addition, there are some connections, only the connection does not clean up and does not close, which may be due to a programming error, in which case we can set the statement_timeout of the connection to a database and close the connection as long as it does not work. (set 60 seconds)

Alter database database name set statement_timeout = 60000

Here is a final summary.

1 each database has its own characteristics, which is related to the architecture idea at the beginning of database design.

2 the characteristics of the database are not easy to modify, for example, up to now, MYSQL is still relatively suitable to do OLTP, and no one has asked him to do the operation of OLAP, like excessive database myth, everything works, which is not realistic.

3 to master the characteristics of a database, and expand, to weaken their own shortcomings, this is what an DB staff should do.

On why PostgreSQL accepts a large number of connections to the database requires connection pooling to share here, I hope the above content can be of some help to you, you can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.