In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How do you feel about the positioning problem that OOM causes the lock not to be released? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Recently, it has been found that business returns are slow, and feign's fallback has been triggered. Check the log and find that the retry is triggered. What triggered the retry? find it through the exception stack:
Caused by: feign.RetryableException: connect timed out executing GET http://test-service/test-api at feign.FeignException.errorExecuting (FeignException.java:65) ~ [feignlycoremuri 9.7.0.jarhammer plaza?] At feign.SynchronousMethodHandler.executeAndDecode (SynchronousMethodHandler.java:105) ~ [feignMuthcoreMuir 9.7.0.jarhammer hand Vera?] At feign.SynchronousMethodHandler.invoke (SynchronousMethodHandler.java:77) ~ [feignMuthcoreMuir 9.7.0.jarhammer hand Vera?] At feign.hystrix.HystrixInvocationHandler$1.run (HystrixInvocationHandler.java:107) ~ [feignlyhystrixMuray 9.7.0.jarring peg?] At com.netflix.hystrix.HystrixCommand$2.call (HystrixCommand.java:302) ~ [at com.netflix.hystrix.HystrixCommand$2.call (HystrixCommand.java:298) 1.5.18] at rx.internal.operators.OnSubscribeDefer.call (OnSubscribeDefer.java:46) ~ [Rxjava Muhl 1.3.8.jarhammer Blade 1.3] .8]. 27 moreCaused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect (Native Method) ~ [?] At java.net.AbstractPlainSocketImpl.doConnect (AbstractPlainSocketImpl.java:399) ~ [?] At java.net.AbstractPlainSocketImpl.connectToAddress (AbstractPlainSocketImpl.java:242) ~ [?] At java.net.AbstractPlainSocketImpl.connect (AbstractPlainSocketImpl.java:224) ~ [?] At java.net.Socket.connect (Socket.java:591) ~ [?] At sun.net.NetworkClient.doConnect (NetworkClient.java:177) ~ [?] At sun.net.www.http.HttpClient.openServer (HttpClient.java:474) ~ [?] At sun.net.www.http.HttpClient.openServer (HttpClient.java:569) ~ [?] At sun.net.www.http.HttpClient. (HttpClient.java:242) ~ [?] At sun.net.www.http.HttpClient.New (HttpClient.java:341) ~ [?] At sun.net.www.http.HttpClient.New (HttpClient.java:362) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient (HttpURLConnection.java:1248) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.plainConnect0 (HttpURLConnection.java:1187) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.plainConnect (HttpURLConnection.java:1081) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.connect (HttpURLConnection.java:1015) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.getInputStream0 (HttpURLConnection.java:1587) ~ [?] At sun.net.www.protocol.http.HttpURLConnection.getInputStream (HttpURLConnection.java:1515) ~ [?] At java.net.HttpURLConnection.getResponseCode (HttpURLConnection.java:527) ~ [?] At feign.Client$Default.convertResponse (Client.java:150) ~ [feignMuthcoreMuir 9.7.0.jarhammer hand Vera?] At feign.Client$Default.execute (Client.java:72) ~ [feignMuthcoreMuir 9.7.0.jarhammer hand Vera?] At org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient.execute (TracingFeignClient.java:91) ~ [springlub TracingFeignClient.java:91 2.0.3.RELEASE.jarmist] at org.springframework.cloud.sleuth.instrument.web.client.feign.LazyTracingFeignClient.execute (RELEASE) ~ [springlily cloudlily 2.0.3.RELEASE 2.0.3.RELEASE] At org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer$1.doWithRetry (RetryableFeignLoadBalancer.java:103) ~ [springluk 2.0.3.RELEASE.jarmist] at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer$1.doWithRetry (RetryableFeignLoadBalancer.java:88) ~ [springlily cloudlyopenfeigncoreMutel 2.0.3.RELEASE] at org.springframework.retry.support .RetryTemplate.doExecute (RetryTemplate.java:287) ~ [springMushretryMet 1.2.4.RELEASE.jarhammer paste?] At org.springframework.retry.support.RetryTemplate.execute (RetryTemplate.java:180) ~ [springcopyright retryMurray 1.2.4.RELEASE. Jarring plaster?] At org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute (RetryableFeignLoadBalancer.java:88) ~ [springluk 2.0.3.RELEASE.jarmist] at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute (RetryableFeignLoadBalancer.java:54) ~ [springlily cloudlyopenfeigncoreMutel 2.0.3.RELEASE] at com.netflix.client.AbstractLoadBalancerAwareClient$1.call (AbstractLoadBalancerAwareClient.java:104) ~ [Ribbonslay loadbalancerlub 2.2.5.jar loadbalancerlue 2.2.5] at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3 $1.call (LoadBalancerCommand.java:303) ~ [Ribbonslay loadbalancerly2.2.5.at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3 $1.call (LoadBalancerCommand.java:287) ~ [Ribbonsler loadbalancerMuir 2.2.5.jarlemaged2.2.5] At rx.internal.util.ScalarSynchronousObservable$3.call (ScalarSynchronousObservable.java:231) ~ [rxjavamur1.3.8.jarring peg 1.3.8] at rx.internal.util.ScalarSynchronousObservable$3.call (ScalarSynchronousObservable.java:228) ~ [rxjavamer1.3.8.jarring paddle 1.3.8] at rx.Observable.unsafeSubscribe (Observable.java:10327) ~ [rxjavaashi 1.3.8.jarring lug 1.3.8]
It was found that the call to the micro-service test-service connection timed out. Note that the connection timed out, not the read timeout. Next, first check the ribbon connection timeout configuration to see if the configured connection timeout is too short. It is found that:
# ribbon default connection timeout ribbon.ConnectTimeout=500
It has been a long time for 500ms to establish a tcp connection, so there is no problem with configuration. Manually call the interface of the micro-service instance from the current instance to see if it works:
Port of ip:test-service for curl http://test-service
It was found that the access was successful and there was no blocking. To prove that there is no problem with the network connection, let's take a look at the netstat network connection status:
Active Internet connections (servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 00 test-5c665cc496-6jkx5:54644 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54666 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5 : 54680 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54088 ip-10-238-8-68.eu-central-1.compute.internal:8211 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54674 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54682 ip-10-238-8 -119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54652 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 320 test-5c665cc496-6jkx5:47092 244213-201-91.test.com:https CLOSE_WAIT tcp 00 test-5c665cc496-6jkx5:54662 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED Tcp 00 test-5c665cc496-6jkx5:54624 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 320 test-5c665cc496-6jkx5:47098 244213-201-91.test.com:https CLOSE_WAIT tcp 00 test-5c665cc496-6jkx5:54672 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54630 Ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:58758 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp 00 test-5c665cc496-6jkx5:8251 10-238-8-117.api-gateway.test1.svc.cluster.local:43132 FIN_WAIT2 tcp 00 test-5c665cc496-6jkx5:54648 ip-10-238- 8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:58778 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54646 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54628 ip-10-238-8-119.eu-central-1 .compute.internal: 6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54650 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54632 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:54638 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp 00 test-5c665cc496-6jkx5:59434 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp 320 test-5c665cc496-6jkx5:47104 244213-201-91.test.com:https CLOSE_WAIT tcp 10 test-5c665cc496-6jkx5:54146 ip-10-238-8-68.eu-central-1.compute.internal:8211 CLOSE_WAIT tcp 320 test-5c665cc496-6jkx5:47100 244213201-91.test.com:https CLOSE_WAIT tcp 320 test-5c665cc496-6jkx5:47106 244213201-91.test.com:https CLOSE_WAIT
Found:
No link has been established to any instance ip of the current service-test microservice
Not a lot of Internet connections, not a lot of Timed_waiting and close_wait.
It is guessed that the ip address of the called service-test is not up to date.
Now it's time for Athas to launch Arthas. Let's look at the real ip called by feign: since we use sleuth, here we use the feigh client monitored by sleuth to see the calling ip:
Watch org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient execute params [0] .url ()
The corresponding code is:
Package org.springframework.cloud.sleuth.instrument.web.client.feign;final class TracingFeignClient implements Client {@ Override public Response execute (Request request, Request.Options options) throws IOException {/. }}
Through observation, we found that it is indeed the previous ip. So why not update it? let's take a look at the EurekaClient-related code and refer to my other article: Spring Cloud Eureka complete solution (4)-Core process-Service and instance list for detailed explanation. Let's take a look at the class PollingServerListUpdater. The thread pool responsible for updating is:
Int coreSize = poolSizeProp.get (); ThreadFactory factory = (new ThreadFactoryBuilder ()) .setNameFormat ("PollingServerListUpdater-%d") .setDaemon (true) .build (); _ serverListRefreshExecutor = new ScheduledThreadPoolExecutor (coreSize, factory)
Thankfully, this thread has a name. Let's take a look at the thread list through Arthas:
[arthas@24] $threadthreadThreads Total: 736, NEW: 0, RUNNABLE: 81, BLOCKED: 0, WAITING: 634, TIMED_WAITING: 21 TERMINATED: 0 ID NAME GROUP PRIORI STATE% CPU TIME INTER DAEMON 11 Log4j2-TF-2-AsyncLo main 5 TIMED 39 138 Log4j2-TF-2-AsyncLo main 46 false true 17 Log4j2-TF-5-AsyncLo main 5 TIMED 37 137 VR 27 false true 1045 as-command-execute- System 10 RUNNA 19 0:0 false true 1027 AsyncAppender-Worke system 9 WAITI 0 0:0 false true 68 AsyncResolver-boots main 5 TIMED 0 0:0 false true 264 AsyncResolver-boots main 5 WAITI 0 0:0 false true 1025 Attach Listener system 9 RUNNA 0 0:0 false true 9 Common-Cleaner InnocuousThr 8 TIMED 0 0:0 false true 151 DataPublisher main 5 TIMED 0 0:2 false true 107 DestroyJavaVM main 5 RUNNA 0 0:37 false false 69 DiscoveryClient-0 main 5 TIMED 0 0:1 false true 70 DiscoveryClient-1 main 5 WAITI 0 0:1 false true 748 DiscoveryClient-Cac main 5 WAITI 0 0:9 false true 751 DiscoveryClient-Hea main 5 WAITI 0 0:7 false true 71 DiscoveryClient-Ins main 5 TIMED 0 0:0 false true 24 Druid-ConnectionPoo main 5 WAITI 0 0:0 false true 21 Druid-ConnectionPoo main 5 WAITI 0 0:0 false true 25 Druid-ConnectionPoo main 5 TIMED 0 0:0 false true 22 Druid-ConnectionPoo main 5 TIMED 0 0:0 false true 20 Druid-ConnectionPoo main 5 TIMED 0 0:0 false true 67 Eureka-JerseyClient main 5 WAITI 0 0:0 false true 3 Finalizer system 8 WAITI 0 0:0 false true 119 ForkJoinPool.common main 5 WAITI 0 0:9 false true 30 I/O dispatcher 1 main 5 RUNNA 0 0:2 false false 155 NFLoadBalancer-Ping main 5 TIMED 0 0:0 false true 150 PollingServerListUp main 5 WAITI 0 0:0 false true 157 PollingServerListUp main 5 WAITI 0 0:0 false true 2 Reference Handler system 10 RUNNA 0 0:0 false true 146RibbonApacheHttpCli main 5 TIMED 0 0:0 false true 135RxComputationSchedu main 5 TIMED 0 1:44 false true 132RxIoScheduler-1 (Ev main 5 TIMED 0 0:0 false true 4 Signal Dispatcher system 9 RUNNA 0 0: 0 false true 114 SimplePauseDetector main 5 TIMED 0 2:10 false true 115 SimplePauseDetector main 5 TIMED 0 2:10 false true 116 SimplePauseDetector main 5 TIMED 0 2:12 false true
Discover that the keyword PollingServerListUp has two threads, 150 and 157, and see what they are doing, respectively:
[arthas@24] $thread 150thread 150 "PollingServerListUpdater-0" Id=150 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@778af81b at java.base@11.0.4/jdk.internal.misc.Unsafe.park (Native Method)-waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@778af81b at java.base@11.0.4/java.util.concurrent.locks.LockSupport.park (LockSupport.java:194) at java.base@11.0.4/java.util.concurrent.locks .AbstractQueuedSynchronizer $ConditionObject.await (AbstractQueuedSynchronizer.java:2081) at java.base@11.0.4/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take (ScheduledThreadPoolExecutor.java:1170) at java.base@11.0.4/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take (ScheduledThreadPoolExecutor.java:899) at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1054) at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1114) at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628) at java.base@11.0.4/java.lang.Thread.run (Thread.java:834) Affect (row-cnt:0) cost in 322 ms. [arthas@24] $thread 157thread 157 "PollingServerListUpdater-1" Id=157 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3adcda2 at java.base@11.0.4/jdk.internal. Misc.Unsafe.park (Native Method)-waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3adcda2 at java.base@11.0.4/java.util.concurrent.locks.LockSupport.park (LockSupport.java:194) at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt (AbstractQueuedSynchronizer.java:885) at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued (AbstractQueuedSynchronizer.java:917) At java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire (AbstractQueuedSynchronizer.java:1240) at java.base@11.0.4/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock (ReentrantReadWriteLock.java:959) at com.netflix.loadbalancer.BaseLoadBalancer.setServersList (BaseLoadBalancer.java:475) at com.netflix.loadbalancer.DynamicServerListLoadBalancer.setServersList (DynamicServerListLoadBalancer.java:156) at com.netflix.loadbalancer.DynamicServerListLoadBalancer.updateAllServerList (DynamicServerListLoadBalancer.java:267) at Com.netflix.loadbalancer.DynamicServerListLoadBalancer.updateListOfServers (DynamicServerListLoadBalancer.java:250) at com.netflix.loadbalancer.DynamicServerListLoadBalancer$1.doUpdate (DynamicServerListLoadBalancer.java:62) at com.netflix.loadbalancer.PollingServerListUpdater$1.run (PollingServerListUpdater.java:116) at java.base@11.0.4/java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515) at java.base@11.0.4/java.util.concurrent.FutureTask.runAndReset (FutureTask.java:305) at java.base@11 .0.4 / java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:305) at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128) at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628) at java.base@11.0.4/java.lang.Thread.run (Thread.java:834) Number of locked synchronizers = 1-java.util.concurrent.ThreadPoolExecutor$Worker@6ab1c5f9
It is found that PollingServerListUpdater-1 has been waiting for the lock to be acquired. This seems to be the problem. Check the corresponding Ribbon code and find:
PollingServerListUpdater-1 needs to acquire the write lock of allServerLock.
The read lock of allServerLock can only be obtained by runPinger (for each instance ping, you need to read the list of instances).
So is there something wrong with Ping? let's take a look at the corresponding code BaseLoadBalancer that acquires the lock:
Public void runPinger () throws Exception {/ / omit the useless code allLock = allServerLock.readLock (); allLock.lock (); allServers = allServerList.toArray (new Server [allServerList.size ()]); allLock.unlock ()
We can see that the code here does not use the routine of try {lock} finally {unlock}. If the intermediate code is abnormal, the lock cannot be released. And here is the reentry lock, how many times you need to unlock lock, less than once, other threads can not get it.
AllServers = allServerList.toArray (new Server [allServerList.size ()]); no IO is involved, just data transformations and no null pointer exceptions. It is assumed that an OOM exception occurred, resulting in an unallocated memory. Check the log, and sure enough, found OOM.
This tells us that for locks, we must try {lock} finally {unlock}. Even if the code does not throw any exceptions, it is possible that the lock cannot be released when an OOM occurs
The question about the location of the lock not released due to OOM is that the answer to the question is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.