How to carry out current limiting and fuse degradation 07/13 Update SLTechnology News&Howtos

How to carry out current limiting and fuse degradation

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to limit current and fuse degradation". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preface

With the popularity of micro-services, the stability between services becomes more and more important. often, we spend a lot of experience on maintaining the stability of services. Current limiting and circuit breaker degradation are the two most commonly used means. Some time ago, some small partners in the group had some questions about the use of current restrictions, coupled with the recent promotion of the company to do something related to current restrictions, so here is a summary of their own views on current restrictions.

As mentioned just now, current restriction is one of the means to ensure the stability of our service, but it does not guarantee the stability of all scenarios. Like his name, he can only play his role in scenarios with large or sudden traffic. For example, our system supports 100QPS at most, but suddenly a 1000QPS request comes in, and the system may hang up directly at this time, resulting in the latter request cannot be processed. But if we have limited flow means, no matter how big the QPS is, we only deal with 100QPS requests and reject other requests directly. Although we rejected 900 QPS requests, our system did not hang up. Our system can still continue to process subsequent requests, which is what we expect. Some students may say that now that it is all on the cloud, the dynamic scaling of the service should be very simple. If we find that when the traffic is particularly large, the machine will be automatically expanded to support the target QPS, then there will be no need for current restrictions. In fact, there should be quite a lot of students with this idea, and some students may have been fooled by some boastful articles, so they think so. This idea can be realized when it is particularly idealized, but in reality, there are the following problems:

It takes time to expand capacity. Simply speaking, to expand the capacity is to build a new machine and then republish the code. Java students should know that the time for the successful release of a code is not calculated in seconds, but in minutes. Sometimes you complete the expansion, and maybe the traffic spike has passed.

How much to expand the capacity is a particularly complicated problem. Expanding the capacity of several machines is relatively complex, which requires a large number of pressure test calculations, as well as an expansion on the entire link. If you expand the capacity of the machines on your side, it is also a problem that the machines of other teams may end up with bottlenecks.

Therefore, simple expansion can not solve this problem, current restriction is still a skill that we must master! Spring family bucket is also one of the necessary treasure books!

Basic principles

If you want to master current limiting, you need to master some of his basic algorithms. Current limiting algorithms are basically divided into three types, counter, funnel, token bucket, and others are evolved on these basis.

Counter algorithm

First of all, let's talk about the counter algorithm, this algorithm is relatively simple and rough, we only need a cumulative variable, and then refresh the cumulative variable every second, and then determine whether the cumulative variable is greater than our maximum QPS.

Int curQps = 0; long lastTime = System.currentTimeMillis (); int maxQps = 100; Object lock = new Object (); boolean check () {synchronized (lock) {long now = System.currentTimeMillis (); if (now-lastTime > 1000) {lastTime = now; curQps = 0;} curQps++ If (curQps > maxQps) {return false;}} return true;}

This code is relatively simple, we define the current qps, and the time when the cumulative variables were last refreshed, as well as our maximum qps and our lock lock. Every time we check, we need to determine whether it needs to be refreshed. If we need to refresh, then we need to reset the time and qps, and then make a cumulative judgment of qps.

The problem caused by this algorithm is also very obvious because it is too simple. If our maximum qps is 100,100 requests in 0.99s and then 100requests in 1.01s, this can be done through our program, but we actually passed 200requests in 0.03s, which is definitely not in line with our expectations. Because it is very likely that these 200 requests will directly hang up our machines.

Sliding window counter

To solve the critical problem above, we can use a sliding window here to solve this problem:

As shown in the figure above, we divided the ordinary counter of 1s into five 200ms. Our current qps needs to count all the qps of the last five windows. Back to the question just now, 0.99s and 1.01s are actually within our last five windows, so there will not be the critical spike problem just now.

In fact, from another point of view, our ordinary counter is actually a sliding window counter with a number of windows of 1. As long as we share more windows, the statistics will be more accurate when we use the counter scheme, but relatively speaking, the cost of maintaining the window will increase. When we introduce sentinel, we will describe in detail how to achieve sliding window counting.

Funnel algorithm

The critical spike problem in the counter can also be solved by the funnel algorithm, as shown in the following figure:

In the funnel algorithm, we need to pay attention to the leaky bucket and uniform outflow, no matter how big the flow is, it will first go to the leaky bucket, and then flow out at a uniform speed. How do you achieve this uniform speed in your code? For example, if we want the uniform speed to be 100q/s, we can get that we need to consume 10ms for each outflow, similar to a queue. Every 10ms takes traffic from the queue header for release, and our queue is a leaky bucket. When the traffic is greater than the length of the queue, we can reject the excess.

The funnel algorithm also has some disadvantages: it can not cope with sudden traffic (unlike the critical spike above, don't be confused). For example, when 100 requests come in an instant, there can only be one by one in the funnel algorithm, and a second has elapsed when the last request flows out, so the funnel algorithm is more suitable for scenarios where the request arrival is more uniform and the request rate needs to be strictly controlled.

Token bucket algorithm

To solve the sudden traffic situation, we can use the token bucket algorithm, as shown in the following figure:

Production token: we also assume here that the maximum qps is 100. then we convert from one traffic per 10ms of the funnel to one token per 10ms until we reach the maximum token.

Consumption token: each of our traffic consumes token buckets. The consumption rules here can vary, either simply consuming one token per traffic or according to different traffic packet size or traffic type. For example, the queried traffic consumes 1 token and the written traffic consumes 2 tokens.

Determine whether to pass or not: if the token bucket is enough, then we allow traffic to pass, and if it is not enough to wait or simply refuse, this can be controlled by a queue like a funnel.

Single machine current limit

Above we have introduced some basic algorithms of current limit, we apply these algorithms to our distributed service can be divided into two kinds, one is single machine current limit, the other is cluster current limit. Single machine current limit means that each machine does its own current limit and does not affect each other. Let's take a look at how to realize the current limit on a single machine.

Guava

Guava is Google's open source java core tool library, which includes useful tools such as collection, caching, concurrency and so on. Of course, it also provides the current-limiting tools we need here. The core class is RateLimiter.

/ / RateLimiter rateLimiter = RateLimiter.create (100,500, TimeUnit.MILLISECONDS); preheated rateLimit RateLimiter rateLimiter = RateLimiter.create (100); / / simple rateLimit boolean limitResult = rateLimiter.tryAcquire ()

The way to use it is relatively simple. As shown in the above code, we only need to build a RateLimiter, and then call the tryAcquire method. If we return "true", it means that the traffic is passing through at this time. On the contrary, it will be restricted. RateLimiter is also divided into two types in guava. One is the implementation of ordinary token bucket algorithm, and the other is RateLimiter with preheating, which allows us to gradually increase the release speed of token bucket to the maximum. This one is also available in sentinel. This can be used in some cold systems, such as database connection pool is not fully filled, and is still used in continuous initialization scenarios.

Here is a brief introduction to how guava's token bucket is implemented:

An ordinary token bucket creates a SmoothBursty class, which is the key to our implementation of current limit. How to do it is found in our tryAcquire:

This is divided into four steps:

Step1: add a synchronous lock, you need to note that there is no lock in sentinel, there is this in guava, and there will be some problems with sentinel later.

Step2: determine whether you can apply for a token bucket. If there are not enough tokens in the bucket and the waiting time exceeds our timeout, we will not apply here.

Step3: apply for a token and get the wait time. The timeout parameter in our tryAcquire is our maximum wait time. If we just call tryAcquire (), there will be no wait, and the second step has already failed quickly.

Step4: the time that sleep waits.

The method of deducting tokens is specified in the reserverEarliestAvailable method:

Although there seems to be a lot of process here, if we just call tryAcquire (), we only need to pay attention to two red boxes:

Step1: according to the latest time to issue token, guava does not use other threads to issue token asynchronously, and the update of token is placed in the current limit method every time we call. This design is worth learning. Very often, we do not necessarily need asynchronous thread to execute to achieve our desired goal, and it is not as complex as asynchronous thread.

Step2: deducting tokens, here we have checked in canAcquire, tokens must be deducted successfully.

Guava's current limit provides these two ways. Many middleware or business services use guava's current limit as their own tool, but the way of guava is relatively limited, dynamic change of current limit and more policies are not supported, so let's introduce sentinel next.

Sentinel

Sentinel is a lightweight flow control framework of Alibaba's open source distributed service framework, which inherits the core scenario of Alibaba's Singles Day traffic promotion in the past 10 years. Its core is flow control, but it is not limited to flow control, but also supports circuit breaker degradation, monitoring and so on.

The current limit using sentinel is slightly more complex than guava. Here is the simplest code:

String KEY = "test"; / / = initialization rules = List rules = new ArrayList (); FlowRule rule1 = new FlowRule (); rule1.setResource (KEY); / / set limit qps to 20 rule1.setCount (20); rule1.setGrade (RuleConstant.FLOW_GRADE_QPS); rule1.setLimitApp ("default"); rules.add (rule1); rule1.setControlBehavior (CONTROL_BEHAVIOR_DEFAULT) FlowRuleManager.loadRules (rules); / / = current limit decision = Entry entry = null; try {entry = SphU.entry (KEY); / / do something} catch (BlockException E1) {/ / current limit throws a BlockException exception} finally {if (entry! = null) {entry.exit () }}

Step1: the concept of Resource is emphasized in sentinel. What we protect or act on is based on Resource, so we first need to determine the key of our Resource. Here we simply set it to test.

Step2: then we initialize a current limit rule of our Resource. We choose the current limit for QPS and the default policy. The default here is to use the sliding window version of the counter, and then load it into the global rule manager. The setting of the whole rule is quite different from that of guava.

Step3: the second important concept in sentinel is that Entry,Entry represents a resource operation, the current invocation information is saved internally, and the entry needs to be exited when finally. When we execute the current limit decision, we actually obtain the Entry,SphU.entry, which is the key for us to implement our above current limit rules. Here, unlike guava, if the current limit is limited, the BlockException will be thrown, and we are dealing with the current limit.

Although the overall use of sentinel is much more complex than that of guava, the optional algorithm is a little more than the current limit of guava.

Based on number of concurrency (number of threads)

What we introduced earlier is based on QPS, which provides a concurrency-based strategy in sentinel, which is similar to semaphore isolation. We can use this pattern when we need to keep the business thread pool from being exhausted by slow calls.

Generally speaking, the http interface provided by the same service uses a thread pool, such as the tomcat-web server we use, then we will have a tomcat business thread pool. If there are two methods An and B in http, the speed is relatively fast, and the speed of An is relatively slow. If a large number of calls to A this method, because the speed of An is too slow, the thread can not be released. It is possible to cause the thread pool to be exhausted, and another method B will not get the thread. In this scenario, we have encountered a direct result that all requests received by the entire service are rejected. Some students said that it is OK to limit the QPS of A, but it should be noted that QPS is per second. If the time of our An interface is more than 1 s, then the QPS will be recalculated after the next wave of A comes.

Based on this, the current limit based on the number of concurrency is provided, and we can implement this current limit mode by setting Grade to FLOW_GRADE_THREAD.

Based on QPS

Current-limiting sentinel based on QPS also provides four strategies:

Default policy: set Behavior to CONTROL_BEHAVIOR_DEFAULT, which is the sliding window counter mode. This method is suitable for cases where the processing capacity of the system is exactly known, such as when the exact water level of the system is determined by pressure measurement.

Warm Up: set Behavior to CONTROL_BEHAVIOR_WARM_UP, similar to the warmup described earlier in guava. Warm-up start mode. When the system is at a low water level for a long time, when the flow increases suddenly, directly pulling the system to a high water level may crush the system instantly. The graph of QPS in this mode is as follows:

Uniform queuing: set Behavior to CONTROL_BEHAVIOR_RATE_LIMITER, which is actually a funnel algorithm, and its advantages and disadvantages have been explained before.

Warm Up + uniform queuing: set Behavior to CONTROL_BEHAVIOR_WARM_UP_RATE_LIMITER. Before warm up reaches the high water level, the sliding window algorithm is used to limit the flow. In this mode, the uniform queuing algorithm continues to be used.

Based on call relationship

Sentinel provides a more complex flow limit, which can be done more flexibly based on the call relationship:

Limit the flow according to the caller: the use of the caller's current limit is complicated. You need to call ContextUtil.enter (resourceName, origin), and origin is our caller's identity. Then, when setting the parameters of our rule, you can set the limitApp to limit the caller's flow:

Set to default, which limits the flow to all callers by default.

Set to {some_origin_name}, which means that the current is limited to a specific caller.

If set to other, the current is limited except for the callers represented by one of the configured referResource parameters.

Associated flow control: it is also supported in sentinel. Two associated resources can influence each other on flow control. For example, two interfaces use the same resource, one interface is more important, and the other interface is not so important. We can set a rule that when important interfaces are accessed massively, another unimportant interface can be restricted. Prevent the sudden occurrence of traffic on this interface that affects important interfaces.

Some problems of sentinel

Although sentinel provides so many algorithms, there are some problems:

First of all, it is difficult to get started with sentinel. Compared with the two lines of code in guava, you need to know some nouns to use sentinel, and then use these nouns again. Although sentinel provides some notes to help us simplify our use, it is still more complex than guava as a whole.

Sentinel has certain operation and maintenance costs, and the use of sentinel often requires setting up the server backend of sentinel. Compared with the out-of-the-box use of guava, there are certain operation and maintenance costs.

There is a concurrency problem in the current limit statistics of sentinel. There is no lock in the source code of sentinel. In extreme cases, if the limit of qps is 10, if there are 100 logics that pass the current limit at the same time, it will pass, but this will not happen in guava.

These problems are basically compared with the current restrictions of guava. After all, sentinel has more functions and costs more.

Cluster current limit

All the current restrictions mentioned earlier are single-machine current restrictions, but now we are all in the architecture mode of micro-service clusters. Usually, a service has multiple machines, such as an order service, and this service has 10 machines. So what should we do if we want to limit the flow of the entire cluster to 500QPS? This is very simple. Just limit the current to 50 per machine. 50 / 10 is 500. but in the real world, there will be load imbalance. When micro-services are called, there are a variety of load balancing algorithms, such as computer room priority, rotation training, random and other algorithms. These algorithms may cause our load not to be particularly balanced, which will lead to the QPS of our entire cluster. The current is even limited at 400, which is what we have encountered in the real scene. Since there is a problem with single machine current limit, we should design a more perfect cluster current limit scheme.

Redis

This solution does not rely on the current-limiting framework. Our entire cluster uses the same redis, and we need to encapsulate the current-limiting logic by ourselves. Here we use the simplest counter to design. We set our system time in seconds as key, set it to redis (you can set a certain expiration time for space cleaning), and use redis's int atomic addition to add + 1 for each request. Then determine whether the current value exceeds our maximum current limit.

Redis's scheme is relatively simple to implement as a whole, but it is strongly dependent on our system time, so it may not be accurate if there is a deviation in the system time between different machines.

Sentinel

A clustering solution is provided in sentinel, which is more distinctive than some other current-limiting frameworks. Two modes are available in sentinel:

Independent mode: the current limiting service is deployed as a separate server. As shown in the figure below, all applications obtain the token from a separately deployed token-server. This mode is suitable for global flow restrictions between services. For example, in the figure below, An and B will go to token-server to get it. Generally speaking, this scenario is relatively rare, but there are more traffic restrictions in the cluster within the service.

Embedded mode: in embedded mode, we will deploy server to our application instance, and we can also convert our server-client identity through the API. Of course, we can introduce some logic settings of zk to allow our leader to act as server, and the machine can be switched automatically when it hangs up. This is more suitable for the flow restriction between the same service cluster, and it is more flexible, but it should be noted that a large number of token-server visits may also affect our own machines.

Of course, sentinel also has some strategies. If token-server dies, we can degenerate into our stand-alone current-limiting mode, which will not affect our normal service.

Actual combat

We have introduced a lot of current-limiting tools above, but many students are still confused about how to limit current. If we restrict the flow of a scene or a resource, we need to confirm the following points:

Where to limit the current?

Limit how many streams

How to choose a tool

Where to limit the current?

This problem is quite complicated. Many companies and many teams have different practices. I had a wave of SOA at Meituan's time. At that time, I remember that all the interfaces of all services needed to be current-limited. I asked each team to evaluate a reasonable upper limit of QPS for the interface. This is correct in theory. We should give an upper limit to each interface to prevent the overall system from being dragged down. But the cost of doing so is very high, so most companies still choose to limit current.

First of all, we need to determine some core interfaces, such as placing orders and paying in the e-commerce system. If the flow is too large, then there will be problems with the path in the e-commerce system, such as reconciliation, which does not affect the core path. We can not set current restrictions.

Secondly, we do not necessarily limit the current only at the interface layer, many times we do the current limit directly at the gateway layer to prevent traffic from further infiltrating into the core system. Of course, the front end can also limit the current. After the current end captures the error code of the current limit, the front end can prompt you to wait for information, which is actually a part of the current limit. In fact, the more downstream the current limit triggers, the greater the waste of our resources, because the upstream has already done a lot of work before the downstream current limit, and if the current limit is triggered at this time, the previous work will be wasted. If some rollback work is involved, it will also increase our burden, so for current limit, the higher the trigger, the better.

Limit how many streams

Most of the time, the question of how many streams is limited may be a historical experience. We can use the daily qps monitoring chart, and then add a little redundant QPS to this contact. Maybe this is our current limit. However, there is one scenario to pay attention to, that is, when there is a big push (in this case, it refers to the scenario in the e-commerce system, where the traffic of other systems is relatively high), the traffic of our system will increase suddenly, and it is no longer our daily QPS. In this case, we often need to perform a full-link pressure test on our system before the big push, to find a reasonable upper limit, and then set the current limit based on this upper limit.

How to choose a tool

Generally speaking, larger Internet companies have their own unified current-limiting tools that can be directly used here. For other cases, if there are no requirements such as cluster current restrictions or circuit breakers, I personally think that RateLimter is a good choice. It should be relatively simple to use, and there is basically no learning cost. If there are other needs, I personally think that I choose sentinel. As for hytrix, I personally do not recommend it, because this is no longer maintained.

This is the end of the content of "how to limit current and fuse downgrade". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.