How to understand High performance Database connection Pool 07/15 Update SLTechnology News&Howtos

How to understand High performance Database connection Pool

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to understand high-performance database connection pooling, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.

Why is there a connection pool?

First take a look at the location of the connection pool:

Business implementations of application frameworks generally access databases, caches or HTTP services. Why add a connection pool to the place you visit?

Let's take accessing MySQL as an example to execute a SQL command, which process you need to go through if you don't use connection pooling.

Three-way handshake for 1:TCP to establish a connection

Three-way handshake certified by 2:MySQL

3: real SQL execution

Shutdown of 4:MySQL

5:TCP 's four-way handshake is closed

As you can see, in order to execute a SQL, there are a lot of network interactions that we don't care about.

Advantages: easy to implement.

Disadvantages:

1: more network IO

2: the load of the database is high

3: longer response time and lower QPS

4: frequent applications create and close connections, resulting in more temporary objects and frequent GC

5: after closing the connection, there will be a large number of TCP states of TIME_WAIT (closed after 2 MSL)

Use connection pooling process

On the first visit, a connection needs to be established. However, subsequent access reuses the previously created connection.

Advantages:

1: less network overhead

2: there will be a substantial improvement in the performance of the system

3: no troublesome TIME_WAIT status

Of course, the reality is often cruel, when we solve a problem, accompanied by another problem.

The challenge of using connection pooling: the performance of connection pooling

Performance optimization of number of connections and threads

Sub-library DB deployment structure:

Suppose there are 128 sub-libraries: 32 servers, each with 4 schema. According to the design of 128 sub-libraries, 128 independent database connection pools will be created.

Model of database connection pool

Features:

1Rules 128 connection pools are completely independent, and different schema correspond to different connection pools.

2: first, select the corresponding connection pool by opening the library, reading and writing, and then obtain a connection from the connection pool to operate.

3: after the operation, return the connection to the corresponding connection pool.

Advantages:

Simple structure and decentralized competition

Problems faced:

1: too many threads

First take a look at the number of new threads needed to create a new connection pool.

Connection pool

Number of threads

Description

The number of threads required for 128 sub-libraries

C3P0

four

3 helperThread (pollerThread), 1 scheduled task AdminTaskTimer (DeadlockDetector)

4 '128' 512

DBCP

one

Responsible for heartbeat, minimum number of connections maintenance, * idle time and prevention of connection leakage

1: 128: 128

Druid

two

An asynchronous connection is created. An asynchronously closed connection.

2 "128" 256

You can see that with the increase of sub-libraries, no matter which connection pool is chosen, the number of threads will increase linearly. Too many threads will result in a large memory footprint: by default, 1 thread will take up 1m of space, and if it is 512 threads, it will take up 1M*512=512M context switching overhead.

Tips: because stack and heap request virtual address space, but once used, it will not be released. (threads do not necessarily take up 1m of space)

2: too many connections

The connection resources of the database are heavy, and with the increase of the connection, the performance of the database will decline obviously. DBA generally limits the number of connections per DB, such as 3K. Suppose that the capacity of a single database is limited to 32 sets of 3K Magi, then the capacity is 3K*32=96K. If you apply * *, the minimum number of connections is 10, then a total of 128-101.28K connections are required for each application. Then the number of applications supported by the database in theory is 96K Universe 1.28K = 80.

3: cannot connect to multiplexing

Different schema under the same physical machine are completely independent, and connections cannot be reused.

Optimized database connection pool model

Features:

1: there is only one connection pool, and all nodes share threads (solving the problem of too many threads)

2: each physical machine corresponds to a host, and multiple schema,schema storage connections are maintained in the host.

3: different schema under the same host can reuse connections (to solve the problem of too many connections)

Get the connection process:

1: to obtain a connection, you need to bring ip,port and schema information: for example, you are getting the schema1 of host31.

2: first go to the schema1 of host31 to obtain a free connection, but if schema1 has no free connection, it will get a free connection from schema2.

3: the connection obtained from schema2 executes useschema1, and the connection is switched to schema1.

4: perform the corresponding SQL operation, and return it to the pool connected to schema1 when the execution is complete.

Advantages:

1: connection reuse: effectively reduce the number of connections.

2: improve performance: avoid frequent new connections. The cost of creating a new connection is high, while the cost of using use schema is very small.

3: effectively reduce the number of threads. According to the existing scheme, only about 4 threads are needed. 512 threads are needed before optimization

Disadvantages:

1: management is more complex

2: does not comply with the JDBC interface specification. DataSource has only a simple getConnection () interface and no interface for getting the connection to the corresponding schema. You need to inherit DataSouce to implement a specific interface.

Performance optimization of transaction statements

A model for executing transactions before optimization

If you get a connection from the connection pool, the default is auto-commit. In order to start a transaction, you need to perform a setautocommit=false operation, followed by a specific SQL, and when you return the connection, you need to set the connection to autocommit (you need to perform setautocommit= true). You can see that opening a transaction requires the execution of two additional transaction statements.

The model of executing transaction after optimization

All connections in each schema are grouped by autocommit. It is divided into autocommit (autocommit=true) and non-autocommit (autocommit=false). When obtaining a connection, priority is given to the connection in the packet with the same autocommit. If no connection is available, the connection is obtained from another packet. After the business operation is completed, it is returned to the corresponding packet. This mechanism avoids two transaction statements that open the transaction to execute more than one.

Lock performance optimization

General functions of connection pooling:

Connection pooling mainly consists of five parts: obtaining connections, returning connections, scheduled tasks, maintenance components and resource pools.

Get the connection:

1: get timeout: if the connection is not obtained after the specified time, an exception will be thrown

2: validity check: when you get a resource from the resource pool, you need to check the validity of the resource. If it fails, get the connection again. Avoid reporting errors when performing business.

3: create a connection: you can create it synchronously or asynchronously.

Return the connection:

1: return the connection: for example, check the idle count to determine whether it is physically closed or returned to the connection pool

2: destroy connection: can be terminated synchronously or asynchronously

Scheduled tasks:

1: idle check: mainly check the idle connection. If the connection is idle for more than a certain period of time, the connection will be closed.

2: minimum number of connections control: generally, the minimum number of connections will be set. Ensure the minimum number of connections in the current system. If not, a new connection is created.

Component maintenance:

1: connection status control: idle, use, delete and other state control

2: exception handling: unified exception handling for JDBC access. If the exception is related to the connection, the connection will be destroyed.

3: cache: avoid repeated parsing of SQL. Under the PrepareStatement mechanism, objects parsed by SQL will be cached.

4:JDBC encapsulation: JDBC is implemented, and the real implementation is the underlying driver, such as MySQL-connector-java.

Resource pool:

1: the resource pool is the place where connections are stored and the core of the connection pool.

2: all components basically interact with the resource pool, and the competition for connecting resources is very fierce. The performance of the place will determine the performance of the entire connection pool.

3: the general resource pool is implemented using BlockingQueue provided by JDK. So whether there is a solution that can be designed without locks to avoid competition.

Lock-free design of resource pool

The general process of getting a connection:

1: get the connection from the ThreadLocal, or from the global connection pool (CopyOnWriteArrayList) if there is no free connection.

2: if there are no idle connections in the global connection pool, a new connection is created asynchronously.

3: determine whether the timeout time is greater than the threshold, and if it is less than the threshold, spin. Otherwise, park hibernation occurs.

4: after the connection is successfully established, the thread of park will be awakened.

The lock-free design is mainly realized from four aspects: ThreadLocal,CopyOnWriteArrayList, asynchronous connection and spin.

ThreadLocal

1: each thread has a connection queue. The queue is a reference to the global queue.

2: get the connection from the ThreadLocal when obtaining the connection. If the connection is idle, use it. Otherwise, remove it and take another one until you can't get the connection.

3: when you return the connection, you only need to return it to the Threadlocal queue and set the connection to idle.

4: if you use BlockQueue, call poll when you get the connection and call offer when you return the connection. There is two lock contention. After optimization, the overhead of two locks is avoided through CAS (CAS is used to make the connection non-idle when obtaining the connection, and CAS is used to make the connection idle when returning)

CopyOnWriteArrayList

1: the queue is used in scenarios of a large number of reads and a small number of writes, and the data stored is limited. The connection pooling scenario is very suitable for using CopyOnWriteArrayList.

2: when obtaining or returning a connection, the status of the connection will only be changed through CAS, and the connection pool will not be added or deleted.

3: generally, the number of connection pooling connections is controllable. CopyOnWriteArrayList copies all connections during write operations, which has little impact on memory.

Establish a connection asynchronously

After getting the connection, determine if any concurrency is waiting to acquire the connection, and if so, establish the connection asynchronously. Avoid waiting for the next connection. If the CopyOnWriteArrayList does not have an idle connection, the connection is established asynchronously.

Spin

The spin mechanism is similar to that of JDK to synchronized. If the timeout is found to be greater than the set threshold (such as 10 microseconds), the thread is suspended. If it is less than the set threshold, the connection is re-acquired and selected to avoid the performance overhead caused by thread context switching. no, no, no.

Optimization tips

Method inline optimization

1: each time the method is called, the thread will create a new stack frame, which is relatively expensive.

2:JIT will optimize inline at run time, and multiple methods use one stack frame to avoid creating too many stack frames.

3:JIT method inline optimization the default threshold for the number of bytecodes is 35 bytes, which is less than 35 bytes. (can be set via-XX:MaxInlineSize=35)

By modifying the above code, the compiled bytecode is modified to 34 bytes, and the inline condition can be met.

Heartbeat statement selection PrepareStatement mode selection

MySQL driver defaults to client mode. If you need to enable server mode, you need to set useServerPrepStmts=true. There is no difference between PrepareStatement's default client schema and Statement for the DB side. It is generally understood that the difference between PrepareStatement and Statement is that PrepareStatement avoids SQL injection. But how do you avoid SQL injection?

When you use PrepareStatement to set parameters, such as calling setString (int parameterIndex, String x), the parameters are escaped locally to avoid SQL injection.

When you execute SQL, you will use SQL? Replace it with the escaped character and send it to the database for execution.

PSCache

MySQLdriver is not enabled by default, but can be enabled by setting cachePrepStmts = true

QueryTimeout

Previously, I also encountered the problem of connection leakage due to the opening of queryTimeout.

VIPSHOP self-developed connection pool: Caelus

Caelus is a high-performance distributed database connection pool developed by VIPSHOP.

High performance: based on a lock-free connection pool design model to improve connection pool performance

In the scenario with more sub-libraries, reduce the number of threads. If there are 128 sub-libraries, 128 separate connection pools are required under the existing connection pool model, and each connection pool requires threads (1-4, different connection pools) to process tasks. A total of 128 to 128 threads need to be maintained, which is expensive. The Caelus connection pool greatly reduces the number of threads.

Connection multiplexing. For a MySQL instance, there are multiple schema scenarios above. Connections to schema with different existing connection pools are not reusable. Caelus can reuse the connections of different schema to improve performance.

Too many transaction instructions. If it is a transaction statement, after getting the connection from the connection pool, you need to start the transaction (setautocommit=false) first, and then set it (setautocommit= true) when returning it. Each time a connection is used, two additional transaction instructions need to be executed. Caelus can effectively reduce transaction instructions.

The unification of the configuration specification. Combined with the setting of MySQL, it provides a unified and reasonable configuration.

The above is how to understand high-performance database connection pooling. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.