In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Many novices are not very clear about what they think after the leakage of the database connection pool. in order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.
One: preliminary investigation
In the morning, as the peak of the use of the energy efficiency platform system, the system load is usually larger than other time periods, and a large number of users will log in at some time. On the same day, users reported obstacles in the system, and the release system could not be built online, and then some users could not log in to the system, and the system died. Of course, the system was not really down, but all the connections related to the database were blocked. Then check the log and find that there are a large number of errors.
Related to database connection pooling:
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1-Connection is not available, request timed out after 30002ms.
You can see that the above error report is related to the database connection, with a large number of timeouts. Through the analysis of online debug logs, it is also verified that the database connection pool is consumed massively.
[DEBUG] c.z.h.p.HikariPool: HikariPool-1-Timeout failure stats (total=20, active=20, idle=0, waiting=13)
This is the log before the start of a large number of error reports. We can see that the HikariPool connection pool cannot get the connection at this time, and active=20 means that the database connection that is being used is obtained. Waiting indicates the number of requests currently queued for connections. As you can see, a considerable number of requests are already in a pending state.
So our solution at that time was to adjust the size of the database connection pool. at first, we initially thought that during the peak period, the number of connection pools we set was not large enough to support the number of connections in the morning peak.
Jdbc.connection.timeout=30000 jdbc.max.lifetime=1800000 jdbc.maximum.poolsize=200 jdbc.minimum.idle=10 jdbc.idle.timeout=60000 jdbc.readonly=false
We adjusted the number of database connection pools to 200.
Two: affairs
2.1 consequences of transaction abuse
The configuration was adjusted to 200 in time, and the service restart returned to normal, but I still thought there was a risk of connection leakage in the system, and I tried to look for clues in the behavior shown in the log. I saw in the access log that someone was actually doing a build before the system crashed, and the build was often clicked unresponsive, as shown in the build debug log I added at that time. I began to suspect that the connection leak was caused by the build.
Here I'll briefly talk about the logic of building the code.
User triggers build
Add job to the incremental job cache to update job status
JenkinsClient calls jenkins's api to start building
Write build information to the database (jobname,version)
I began to look at the code I wrote, but after reading it many times, I couldn't find that it had anything to do with the database connection. Most people, including myself at that time, leaked the database connection. In most cases, there should be blocking in the process of service and database connection, resulting in connection leakage. But now, it's easy to see what the problem is, look at the code at the time:
@ Transactional (rollbackFor = Exception.class) public void build (BuildHistoryReq buildHistoryReq) {/ / 1. Encapsulation operation / / 2. Call jenkins Api / / 3. Database update write}
This is the code entry at that time, of course, the code is not that simple. You can see that I added Transactional comments to the method entry, which actually means that when an error occurs, the database rolls back when an exception is thrown.
The problem is that when a user clicks build and the request enters the build method, it will get a connection from the database connection. However, at this point, the program has no database-related operations, and if the code is blocked by io or network at step 1 or 2, the transaction cannot be committed and the connection will always be occupied by the request. And even the largest connection pool will be exhausted. Causing the system to crash.
2.2 correct use of transaction comments
Usually, as a non-business unit, transactions are not particularly demanding at the readable level when they are not involved in core business operations such as payments, orders, and inventory. It usually only involves that when multiple table operations are updated at the same time, ensuring data consistency either succeeds or fails at the same time. And use
@ Transactional (rollbackFor = Exception.class)
Enough.
And how can the above code be improved?
First analyze whether it is necessary to use transactions. In step 3, the data operation, after looking at the code, found that there was only one table operation, and there was no correlation with other operations. And it belongs to the last step. So there is no need to use it in this code, just delete the comments.
Of course, if step 3 operation of the database is a multi-table operation with strong correlation and consistent data, we can do so. Separate the steps that have nothing to do with step 3 and turn them into two methods, so that blocking at 1 and 2 will not affect the database connection.
Public void build (BuildHistoryReq buildHistoryReq) {/ 1. Encapsulation operation / / 2. Call jenkins Api update** (XX);} @ Transactional (rollbackFor = Exception.class) public void update** (XX xx) {/ / 3. Database update write}
It should be noted here that to annotate the usage of the transaction, the method must be publicly invoked.
Three: HttpClient 4.x connection pool
At that time, after finding out the cause of the data connection pool leak, my first step was to get rid of the transaction and then add some logs. At this time, I was able to determine that there was something wrong with the code at jenkinsclient, but I was still not sure what the problem was. I could only add some logs and continue to observe through monitoring.
Sure enough, what I expected happened on the second day of hotfix, there was still a problem with the build release, and of course other features were not affected at this time. I looked at the log and found that the build started and blocked there.
JenkinsClient.startBuild (jobName, params)
Then I observed the project monitoring. Looking at the thread, it is found that a large number of http-nio threads are blocked, and this thread is related to httpclient.
Java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park (Native Method)-parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park (LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2039) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking (AbstractConnPool.java:379) at org.apache.http.pool AbstractConnPool.access $200 (AbstractConnPool.java:69) at org.apache.http.pool.AbstractConnPool$2.get (AbstractConnPool.java:245)-locked (an org.apache.http.pool.AbstractConnPool$2) at org.apache.http.pool.AbstractConnPool$2.get (AbstractConnPool.java:193)
Then I followed the source code and looked at line 379 of the AbstractConnPool class.
You can see that the thread waits indefinitely after executing this.condition.await () at line 379, so if no thread executes this.condition.signal () at this time, the thread will be in the waiting state all the time, and the front end will be delayed to receive the corresponding, resulting in the request timeout.
Let's take a look at the source code to see what causes the thread to run there:
/ * get the http connection. You can also see from the name that this method will cause blocking * / private E getPoolEntryBlocking (final T route, final Object state, final long timeout, final TimeUnit timeUnit, final Future future) throws IOException, InterruptedException, TimeoutException {Date deadline = null; if (timeout > 0) {deadline = new Date (System.currentTimeMillis () + timeUnit.toMillis (timeout)) } this.lock.lock (); try {final RouteSpecificPool pool = getPool (route); E entry; for (;;) {Asserts.check (! this.isShutDown, "Connection pool shut down"); for (;;) {entry = pool.getFree (state) If (entry = = null) {break;} if (entry.isExpired (System.currentTimeMillis () {entry.close () } if (entry.isClosed ()) {this.available.remove (entry); pool.free (entry, false);} else {break }} if (entry! = null) {this.available.remove (entry); this.leased.add (entry); onReuse (entry); return entry } / / New connection is needed final int maxPerRoute = getMax (route); / / Shrink the pool prior to allocating a new connection final int excess = Math.max (0, pool.getAllocatedCount () + 1-maxPerRoute); if (excess > 0) {for (int I = 0; I
< excess; i++) { final E lastUsed = pool.getLastUsed(); if (lastUsed == null) { break; } lastUsed.close(); this.available.remove(lastUsed); pool.remove(lastUsed); } } if (pool.getAllocatedCount() < maxPerRoute) { final int totalUsed = this.leased.size(); final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0); if (freeCapacity >0) {final int totalAvailable = this.available.size (); if (totalAvailable > freeCapacity-1) {if (! this.available.isEmpty ()) {final E lastUsed = this.available.removeLast (); lastUsed.close () Final RouteSpecificPool otherpool = getPool (lastUsed.getRoute ()); otherpool.remove (lastUsed);} final C conn = this.connFactory.create (route); entry = pool.add (conn) This.leased.add (entry); return entry;}} boolean success = false; try {if (future.isCancelled ()) {throw new InterruptedException ("Operation interrupted") } pool.queue (future); this.pending.add (future); if (deadline! = null) {success = this.condition.awaitUntil (deadline);} else {this.condition.await () Success = true;} if (future.isCancelled ()) {throw new InterruptedException ("Operation interrupted") }} finally {/ / In case of 'success', we were woken up by the / / connection pool and should now have a connection / / waiting for us, or else we're shutting down. / / Just continue in the loop, both cases are checked. Pool.unqueue (future); this.pending.remove (future);} / / check for spurious wakeup vs. Timeout if (! success & & (deadline! = null & & deadline.getTime ()
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.