Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to optimize the entanglement between scheduled tasks and feign timeout

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to optimize the entanglement between scheduled tasks and feign timeouts, which is very detailed and has a certain reference value. Friends who are interested must finish it!

1 background

Business timer applications often trigger abnormal circuit breaker alarm messages in the middle of the night.

Find and summarize the following table according to the class prompted by the email

The timed task class AVipTradeReportFeignService#getShopTradeReportByDatepinka-mod-statsShopOrderSturctureTaskBVipMemberStatsFeignService#statMemberRecordpinka-mod-statsMemberStatTaskCVipPartnerWalletFeignService.handlePartnerWithdrawpinka-mod-customerPartnerWithdrawCheckTaskDVipWeixinBabyActivityFeignService.getBabyActivityNoticePagepinka-mod-weixinVipWeixinBabyNoticeTask of the application to which the numbering error reporting method interface belongs

All of the above are generated by external feignmicro service invocation in a distributed timer event processing application (pinka-mod-scheduler), which is equivalent to four types of tasks. Each type of task will call the external feignmicroservice interface one or more times, and there is a problem with the Aggd interface.

Where An and B are both exceptions in the following form

Com.netflix.hystrix.exception.HystrixTimeoutExceptionat com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1 $1.run (AbstractCommand.java:1154) at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call (HystrixContextRunnable.java:45) at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call (HystrixContextRunnable.java:41)...

Both C and D are exceptions in the following form

Feign.RetryableException: 10.13.32.111 failed to respond executing POST http://pinka-mod-customer/vip/partner/wallet/handlePartnerWithdrawat feign.FeignException.errorExecuting (FeignException.java:67) at feign.SynchronousMethodHandler.executeAndDecode (SynchronousMethodHandler.java:104) at feign.SynchronousMethodHandler.invoke (SynchronousMethodHandler.java:76) at feign.hystrix.HystrixInvocationHandler$1.run (HystrixInvocationHandler.java:114)... Caused by: org.apache.http.NoHttpResponseException: 10.13.32.111 virtual 56000 failed to respondat org.apache.http. Impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:141) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse (AbstractMessageParser.java:259)... 2 trace 2.1HystrixTimeoutException timeout exception

Exceptions to An and B occur almost every day, and the hint is clear that the timeout is set in Hystrix (currently 10s) and the execution timeout is caused. Why did it time out? De-interface implementation discovery is a time-consuming logic with for loop scenarios.

It takes time to check the history of execution through the Kibana log system, and it can also be found that they are basically more than 13s, so the basic cause of this kind of exception

2.1.1 solution and thinking

This is actually a typical scenario where the timer task executes and the processing logic is in another microservice, while the processing logic is complex and time-consuming.

a. Increasing the timeout is a rough idea, because a long timeout may lead to a bigger problem, because the timeout is originally for fastfail, and you may encounter scenarios that take 30 seconds or more after 20s. Therefore, this scheme cannot be used for the common default timeout of all calls.

However, you can consider using it on some interfaces, for example, the VipTradeReportFeignService#getShopTradeReportByDate interface evaluation normally takes more than 15 seconds, so set it separately. Related configuration methods:

# default public timeout hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=10000# sets timeout hystrix.command separately for a feign interface. "FeignService#sayHello (String)" .execution.isolation.thread.timeoutInMilliseconds = 15000

b. Optimize the logical execution time of the interface provider. For example, whether the for loop in the above VipTradeReportFeignService#getShopTradeReportByDate can be moved to the interface caller means that the interface provider only performs the operation of the for loop once at a time. To put it bluntly, it is to ensure that the interface returns within the timeout period, which is also in line with the design principles of microservice interfaces.

c. Another idea is that interface processing is asynchronized, that is, the interface provider returns immediately and uses an asynchronous thread to deal with the final logic. However, this alone will lead to unreliable task execution, that is, the successful return of the interface does not mean that the execution must be successful. If the restart or exception of the interface provider causes half of the time-consuming asynchronous logic execution to be interrupted, it is impossible to use the distributed scheduled task scheduling mechanism to retry the execution. Therefore, when using this idea, the interface immediately returns but cannot immediately complete the task as a success. It needs to cooperate with some asynchronous notification mechanisms, that is, the interface provider ends the time-consuming operation successfully and notifies the interface caller. The interface caller then reports the task as a success.

2.2 feign.RetryableException failed to respond executing exception

This is the anomaly of C and D, and it is a random low frequency alarm. It literally means that there is no response to the interface request, and combined with the word "circuit breaker" in the email, it is natural to speculate that it is the problem of the interface providing application (it turned out to be cheated by the word "circuit breaker" afterwards). Therefore, to trace the monitoring indicators of pinka-mod-customer, the application to which the interface belongs, before and after the alarm, it is found that there is nothing abnormal in tcp connection, CPU, memory and network traffic performance. In addition, if it is a circuit breaker, then the call to the interface must fail many times, but each scheduled task will only call the interface once.

At this point, check the controller layer log of the API provider and find that the alarm provider did not enter the controller processing. From this, it is speculated that there is nothing wrong with the application of the provider. On the other hand, when you check the caller's application log and performance metrics, there is no exception at that moment, and you are still making calls to other applications to generate logs. Combined with this exception log, it is speculated that the reason is caused by a network flash break of a call between the caller and the provider (so it is random low frequency).

But it is impossible to explain why the "fuse breaker" was turned on. At this point, to trace the source of the email alarm code, the nature of the alarm is achieved by rewriting the getFallback method in openfeign's official HystrixCommand creation logic. That is, when you enter the fallback logic, the truth will be revealed. In fact, if you are downgraded into fallback, it does not mean that the circuit breaker is enabled. For example, throwing an exception in the run of HystrixCommand will enter the fallback,run execution timeout into fallback, and the circuit breaker will also enter fallback. That is, these anomalies, although the email is written about fuse, but in fact did not turn on fuse, but only into the fallback downgrade!

So feign.RetryableException failed to respond executing is actually an accidental call that fails into fallback, and it's not as complicated as previously thought.

2.2.1 solution and thinking

Naturally, the mail alarm logic should be modified to distinguish between circuit breaker and demotion. If you want to judge the fuse, you can use the following methods

Protected Object getFallback () {if (this.isCircuitBreakerOpen ()) {/ / fuse alarm method sendExceptionEmail (...);} else {/ / non-fuse degradation alarm, if no alarm is required, you can not write sendExceptionEmail (...) }.} the above is all the contents of this article entitled "how to optimize the entanglement between scheduled tasks and feign timeouts". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report