Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

It is better to set the size of the database connection pool.

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

The main content of this article is to explain "it is better to set the size of the database connection pool". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "it's better to set the size of the database connection pool."

I. the size of the database connection pool

Basically, most projects need to interact with the database, so what is the appropriate size for the database connection pool?

Some old developers may also tell you: it doesn't matter, try to set it as large as possible, such as 200, so the database performance will be higher and the throughput will be higher!

You might nod and say yes, is that true? After reading this article, it may subvert your cognition.

Second, the main course begins

It can be said directly that every developer may fall into a hole in setting the size of the database connection pool. in fact, most programmers may rely on their own intuition to set its size to 100? After thinking about it for a long time, take care of yourself, it should be about the same, right?

Suppose your service has 10,000 concurrent accesses

You might as well imagine that you have a website, and although the concurrent pressure has not reached the level of Facebook, what about it? There is also a concurrency of about 10,000! In other words, about 20, 000 TPS.

So here comes the question! How big should the database connection pool for this site be set to?

In fact, there is a problem with this question itself, and we need to ask it the other way around. the correct question should be:

"how small should the database connection pool for this site be set?"

Dictated, the Oracle database is stress tested in the video, simulating 9600 concurrent threads to operate the database, and sleep 550ms between every two database operations. Note that the thread pool size set at the beginning of the video is 2048.

Let's take a look at the 2048 performance test results of the database connection pool:

Each request has to wait for 33ms in the connection pool queue. After obtaining the connection, it takes time to execute SQL, and the CPU consumption is maintained at about 95%.

Next, we change the size of the connection pool to 1024, and the other test parameters remain the same. what's the result?

"here, the wait time for getting a connection is basically the same, but the execution time of SQL is reduced!"

Ouch, there is progress!

Next, let's set it smaller, reduce the size of the connection pool to 96, keep other parameters such as the number of concurrency unchanged, and see what the result is:

The average waiting time of each request in the connection pool queue is 1ms, and the execution time of SQL is 2ms.

WTF! What the hell?

We didn't adjust anything, just reduced the size of the database connection pool, so that we were able to reduce the previous average 100ms response time to 3ms. The throughput is rising exponentially.

You are so slippery!

4. Why does it have this effect?

We might as well think about why the performance of Nginx is much better than the Apache HTTPD of 100 processes with only 4 threads. If you look into the cause, recall the basics of computer science, and the answer is actually very obvious.

You know, even a single-core CPU computer can run hundreds of threads at the same time. But as we all know, this is just a little trick for the operating system to quickly switch time slices and play with us.

A core CPU can only execute one thread at a time, and then the operating system changes the context, and the CPU core quickly schedules and executes the code of another thread, which gives us the illusion that all processes are running at the same time.

In fact, on a core CPU machine, sequential execution of An and B is always faster than "simultaneous" execution of An and B through time slicing switching, for reasons that children's shoes who have taken the operating system course should be clear. Once the number of threads exceeds the number of CPU cores, increasing the number of threads will only be slower, not faster, because of the extra performance consumed by context switching.

Speaking of which, you should suddenly realize

Other factors that should be taken into account

The main reason was mentioned in the last section, but it is not that simple, and we need to consider some other factors.

When we are looking for performance bottlenecks in the database, they can be roughly classified into three categories:

CPU

Disk IO

Network IO

You might say, there is also the memory factor? Memory does need to be considered, but it is slightly trivial compared to disk IO and network IO, so I won't add it here.

Assuming that we do not consider disk IO and network IO, it is easy to conclude that on an 8-core server, setting the number of database connections / threads to 8 can provide the best performance, but increasing the number of connections will lead to performance degradation due to context switching.

As we all know, databases usually store data on disks, which are usually made up of rotating metal discs and a read-write head mounted on a stepper motor. The read / write head can only appear in one location at a time, and when it needs to perform a read and write operation again, it must "address" to another location to complete the task. So? There is addressing time, as well as rotation time, and the read and write head needs to wait for the target data on the disk to "rotate into place" before reading and writing. Using caching can certainly improve performance, but the above principles still apply.

During this period of time, the thread is in a "blocking" waiting state, which means it's not doing anything! At this point, the operating system can use this idle CPU core to serve other threads.

Here we can sum up that when your thread is dealing with I CPU O-intensive business, you can set the number of threads / connections to be a little larger than the CPU core, so that you can do more work and improve throughput at the same time.

So here's the problem again?

What is the appropriate size setting?

It depends on the disk, and if you're using a SSD solid state drive, it doesn't need to be addressed or rotated. Stop! You mustn't take it for granted: "since SSD is faster, let's make the number of threads larger!"

The conclusion is just the opposite! No addressing and no cycling time does mean less blocking, so fewer threads (closer to the number of CPU cores) perform better. Only when blocking is dense, can more threads perform better.

We've already talked about disk IO above, and then let's talk about network IO!

Internet IO is actually very similar. It will also cause blocking when reading and writing data through the Ethernet interface. 10G bandwidth will take less time than 1G bandwidth, and 1G bandwidth will be less blocking than 100m bandwidth. Usually, we put the network IO in the third order, but some people will ignore the impact of the network IO in the performance calculation.

The figure above is the benchmark performance test data for PostgreSQL, from which we can see that TPS slows down when the number of connections reaches 50. In retrospect, in the above Oracle performance test video, the testers reduced the number of connections from 2048 to 96, which is actually too high, unless your server has 16 or 32 CPU cores.

VI. Formula for calculating the number of connections

The following formula is provided by PostgreSQL, but the underlying principle remains the same, and it applies to most database products on the market. Also, you should simulate the expected traffic and set a reasonable value through the following formula, and then fine-tune the number of connections to find the most appropriate size in the actual test.

Number of connections = ((core * 2) + number of effective disks)

The number of cores should not include hyperthreading (hyper thread), even if hyperthreading is turned on. If all the hot spot data is cached, then the number of effective disks is actually zero. As the cache hit ratio decreases, the number of effective disks gradually approaches the actual number of disks. It is also important to note that the effect of this formula on SSD is not clear.

Well, according to this formula, if your server CPU is 4-core i7, the connection pool size should be ((4 * 2) + 1) = 9.

Take a whole, and let's set it to 10. Can you do this? 10 is too small!

If you think it's not good, you can run a performance test, and we can guarantee that it can easily support 3000 users executing simple queries concurrently at a rate of 6000 TPS. You can also increase the connection pool size by more than 10, and at that time, you will see that the response time starts to increase and TPS begins to decline.

Conclusion: what you need is a small connection pool and a queue of threads waiting for connections

Suppose you have 10000 concurrent visits, and you set the connection pool size to 10000, you are afraid it is Shi Le Zhi.

Change it to 1000, too high? Change it to 100? It's still too much.

All you need is a 10 database connection pool, and then the rest of the business threads wait in the queue.

The number of connections in the connection pool should be set to the number of query tasks that the database can effectively perform at the same time (usually no more than the number of 2*CPU cores).

You should often see some web applications with a small number of users set the database connection pool to 100,200 to cope with about a dozen concurrency. Please do not overconfigure the size of your database connection pool.

8. Additional points to be paid attention to

In fact, the setting of the size of the connection pool should be based on the actual business scenario.

For example, if your system mixes both long transactions and short transactions, it is difficult to calculate according to the above formula. The right thing to do is to create two connection pools, one for long transactions and one for "real-time" queries, that is, short transactions.

In another case, for example, if a system executes a task queue and the business requires that only a certain number of tasks are allowed to be executed at a time, we should let the number of concurrent tasks adapt to the number of connection pooled connections, rather than the number of connections to the number of concurrent tasks.

At this point, I believe that everyone has a deeper understanding of "the size of the database connection pool is set to what size is better". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report