How to solve the problem that restarting service interface calls always time out under high concurrency 07/06 Update SLTechnology News&Howtos

How to solve the problem that restarting service interface calls always time out under high concurrency

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to solve the problem of time-out when restarting service interface calls under high concurrency". The explanation in this article is simple and clear and easy to learn and understand. let's study and learn "how to solve the problem of time-out when restarting service interface calls under high concurrency".

Preheating

First of all, let's take a look at what is service preheating?

First of all, to give an example of life, students who have bought a new car should know that every new car has a running-in period, and it takes about one or two thousand kilometers to reach its best condition.

In fact, service preheating also means that there will be a "running-in period" when the service is started, during which the service does not run at its best. If the service traffic is raised to the normal state at once, there may be a large number of request timeouts or instant crushing of the system.

So when the service is just started, we have to slowly increase the traffic until it reaches the upper limit of the threshold after a period of time, giving the system a "preheating process" to make it run at its best.

Then why didn't the system reach its best state when the service was first started?

The approximate reasons are as follows:

There is a class loading process for Java applications, and this process is loaded on demand. That is, when the service is started, JVM loads only the classes necessary for the startup process.

The classes we need are not actually loaded until the service is invoked.

In addition, for some "hot code", JVM will use the JIT compiler to compile into native code to improve the speed of running.

The above two processes are influenced by the JVM system level.

In addition, we may need some cache resources in our service system. At startup, the service needs to load resources because they do not exist.

Implementation method of Dubbo preheating

Well, after understanding what the warm-up is, let's get back to the point and take a look at how Dubbo achieves warm-up.

First let's take a look at the Dubbo service model:

After starting, the service provider will register the node-related information with the registry, and the service consumers can get all the service nodes in time through the registry.

When the service consumer invokes the service, a node will be selected internally through the load balancing component to invoke the service.

As shown in the figure above, assuming that the B-node service has just started, it requires a warm-up process, which requires the service consumer to gradually distribute traffic to the B-node.

Next, we will start from the Dubbo source code to observe the specific implementation of service preheating. The specific source code is located in AbstractLoadBalance#getWeight.

Ps: the current source code Dubbo version is 2.7.4, which is slightly different from the code implementation in this version. See below for details.

This code is mainly divided into three steps:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Get the service provider startup time timestamp

Calculate the service provider elapsed time uptime using the current time minus the service provider startup time

Dynamically calculate the weight of the service preheating process according to the running time

The third step is to calculate the dynamic weight as follows:

The calculation method here is actually very simple. To put it simply, the longer the service runs, the higher the weight until the normal weight.

If the service provider has been running for 1 minute, the final result of the weight is 10.

If the service provider has been running for 5 minutes, the final result of the weight is 50.

If the service provider has been running for 11 minutes, exceeding the default warm-up time threshold of 10 minutes, it will no longer be calculated and the weight default weight will be returned directly.

It should be noted here that Dubbo provides five cloud load balancing strategies by default:

Random LoadBalance: weighted Random Strategy

RoundRobin LoadBalance: weighted polling strategy

LeastActive LoadBalance: "minimum active calls" strategy

ConsistentHash LoadBalance: "consistent Hash" strategy

ShortestResponse LoadBalance: "minimum response time" strategy

Friends of the "ShortestResponse LoadBalance" strategy may be unfamiliar, and this strategy is not mentioned in the official documents.

In fact, this is a new load balancing strategy in Dubbo 2.7.7, and the official document is estimated to have not been updated.

Ps: if you are interested, you can modify the official documents, add this new load balancing strategy, and contribute to open source.

Going back to the body, you can see from the AbstractLoadBalance#getWeight call relationship that the "ConsistentHash LoadBalance" implementation class does not support service prefetch, which needs to be noted.

Dubbo prefetch history bug- repeated horizontal jumps although the code related to Dubbo prefetch doesn't look very difficult on the whole, there are still several Bug in the historical version, resulting in prefetch failure.

Dubbo versions prior to 2.5.5

Prior to Dubbo 2.5.5, AbstractLoadBalance#getWeight was implemented as follows:

This version, like the current code, obtains the service startup time from the node's timestamp. However, there are some problems with this version. Dubbo does not pass the service provider startup time to the consumer, so that the timestamp obtained here is the consumer startup time, which leads to warm-up failure.

Wait until Dubbo 2.5.6, and fix the problem. The source code is as follows:

In this version, the service provider startup time is stored separately in the remote.timestamp property, and the source code is located in ClusterUtils#mergeUrl.

The problem of preheating failure is repaired in this way.

Dubbo 2.7.2 preheating failed again

When the Dubbo version was upgraded to 2.7.2, the warm-up failure Bug came back. The main reason for this problem is that remote.timestamp is cleared from the ClusterUtils#mergeUrl source code, and the service startup time is saved using timestamp instead.

However, because the modification is not complete, AbstractLoadBalance#getWeight still uses remote.timestamp to obtain the startup time of the service, which leads to the failure of preheating.

Hidden bug of warm-up code

This Bug has been completely fixed in Dubbo version 2.7.4 and has been optimized for defects in the code as well.

First take a look at the original code China defect, the original warm-up code implementation uses the following way to calculate the start-up time of the service.

Int uptime = (int) (System.currentTimeMillis ()-timestamp)

But there is a problem here, if the clocks of the service provider and the consumer are not consistent, the service provider startup time is likely to be longer than the consumer local time.

In this case, the result of the uptime calculation is a negative value, which causes the weight to use the default value of the configuration, and the preheating fails.

Therefore, in view of this situation, "@ aftersss" provides a repair scheme, adding the relevant judgment, when the uptime is negative, directly return the weight 1.

However, in the "Code review" process, "@ beiwei30" feels that there is no need to add additional if judgment, you can directly use Math.max compatibility.

However, there is still a problem with this modification: the loss of Integer precision.

If System.currentTimeMillis () = 1566209746000 (2019-08-19 18:15:46) and timestamp = 1561914711000 (201907-01 01:11:51), the difference between the two is "4295035000".

This is a value much larger than Integer.MAX_VALUE, so the precision is lost when long is converted to int, resulting in an actual int value of "67704".

This value is less than the default time of service prefetch (10 * 60 * 1000), so when you enter the dynamic calculation of weight, you will eventually get a relatively small weight, which leads to "false prefetch".

So in the end, the "@ aftersss" repair scheme is adopted, and the long type is used to store the timestamp calculation results. The final optimization code is as follows:

Thank you for your reading. the above is the content of "how to solve the problem of always timeout of restarting service interface calls under high concurrency". After the study of this article, I believe you have a deeper understanding of how to solve the problem of timeout of restarting service interface calls under high concurrency, and the specific usage needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.