How to understand current limiting Logic and algorithm in Micro Service 07/04 Update SLTechnology News&Howtos

How to understand current limiting Logic and algorithm in Micro Service

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to understand the current-limiting logic and algorithm in micro-service". In the daily operation, I believe that many people have doubts about how to understand the current-limiting logic and algorithm in micro-service. I hope it will be helpful to answer the doubt of "how to understand the current-limiting logic and algorithm in micro-service". Next, please follow the editor to study!

The concept of current limitation

First of all, what is current restriction? In fact, the scene of flow restriction in daily life can be seen everywhere. For example, during the morning rush hour in Beijing subway, there is a sea of people every day. If everyone swarms into the station together, it will easily cause congestion in the station. Therefore, subway stations generally have maze-like fences at the entrance, and people enter the station in circles and in batches. This is a typical scene of flow restriction.

So what exactly does current restriction in microservices refer to? Literally, the traffic limit is traffic, and the definition of traffic varies in different scenarios. It can be QPS (requests per second), TPS (transactions per second), or pure network traffic (such as the number of bytes passed by the Nic).

However, what we usually call current restriction refers to limiting the number of concurrent requests that reach the system, so that the system can normally process the requests of some users when it is allowed by its own capacity, and refuse the requests of users that exceed its own processing capacity, so as to ensure the stable operation of the system.

Current restriction will inevitably cause user requests to fail or slow down, which will affect the user experience to a certain extent, so the formulation of current restriction strategy should be based on the results of system stress testing. and make a balance between user experience and system stability.

The necessity of current restriction?

The main purpose of current restriction mentioned earlier is to ensure the stability of the system. In daily business, if you encounter promotional activities such as Singles Day, or abnormal traffic such as crawlers, user traffic increases suddenly, but the processing capacity of the back-end service is limited. If it cannot effectively handle sudden traffic, then the back-end service is easy to be defeated.

You can imagine such a scenario: "the QPS of a single service node is 1000, the service has five nodes, and the QPS of the service is 3000 on a daily basis." Under normal circumstances, there is no pressure on the service. According to the load balancer configuration, the daily QPS of each node is only about 600.

Until one day, the boss suddenly launched a wave of promotions, and the overall QPS of the system reached 8000. At this time, the average load QPS of each node is 1600, and node An is the first to hang up. At this time, there are 4 nodes left in the cluster, and the average load QPS of each node will reach 2000. Therefore, the remaining four nodes also fail one by one, and the whole service is an avalanche.

And if our system has a limited flow mechanism, how will the situation develop?

The overall QPS of the system reaches 8000, but because the overall current limit of the cluster is 5000, the 3000 requests that exceed the capacity of the cluster will be rejected, and the system will normally process 5000 user requests. This is the case of current restriction on the cluster as a whole. For each node, because my bearing capacity is only 1000QPS, then the part beyond 1000 will also be rejected. Although this loses some user requests, it not only ensures the stability of the whole system, but also leaves the system expansion time for development, operation and maintenance.

Thus it can be seen that current restriction is very important for the self-protection of the system. So how to limit the current? Next, we summarize the common current-limiting algorithms.

Common current limiting algorithm

The common current limiting algorithms are: counter, fixed window, sliding window, leaky bucket, token bucket. Next, we introduce these current-limiting algorithms respectively.

Counter current limiting is the simplest and most crude current limiting algorithm. For example, if the system can process 100 requests at the same time, you can save a counter, process a request, add one counter, and subtract one counter after a request has been processed. Each time the request comes in, take a look at the value of the counter and reject it directly if it exceeds the threshold.

In the specific implementation, if the counter is stored in the stand-alone memory, then the single machine current limit is realized; if, for example, in Redis, all the nodes in the cluster are the current limit basis in turn, then the cluster current limit algorithm is implemented.

Advantages: easy to implement, can be implemented on a single machine, such as Java's Atomic and other atomic classes, and clusters can be quickly implemented through the incr operation of Redis.

Disadvantages: counter current limit can not cope with sudden traffic growth. For example, if we allow a threshold of 1W and the value of the counter is 0, then when all 1W requests come in instantly, the service will probably fail. This is because the pressure on the system is not the same as the slow increase of traffic and the sudden influx of traffic.

And generally, current limit is limited to the number of visits within a specified time interval, rather than the overall processing capacity of full-time service, so counter current limit is not suitable for current limit implementation in high concurrency scenarios.

Compared with the counter, the fixed window current limit is based on the number of visitors in a period of time window as the basis of current limit, and the counter is automatically reset every time window passes. The rules are as follows:

The number of requests is less than the threshold. Access is allowed. Add 1 to the counter.

The number of requests is greater than the threshold. Access is denied.

After the time window has passed, the counter automatically clears to zero

Although the fixed window current limit looks perfect, it has the problem of fixed window criticality. For example, if the system allows 1000 requests per second, if the interval of the first time window is 0.1 seconds, but 1000 requests pour in at 0.55 seconds, zero is counted after 1 second, and then 1000 requests pour in at 1.05s.

At this time, although the count in the fixed time window did not exceed the threshold, there was a sudden influx of 2000 requests in 0.5 seconds from 0.55 seconds to 1.05 seconds, which is unbearable for a system with a threshold of 1000 seconds. As shown in the following figure:

In order to solve this problem, the algorithm of sliding window current limit is derived.

The current limit of sliding window solves the problem of critical value of fixed window, and can ensure that the threshold of current limit will not be exceeded in any time window. Compared with the fixed window, the sliding window not only needs to introduce the counter, but also needs to record the time point of each request in the time window.

For example, if the time window is 1 second, the rules are as follows:

Record the time of each request

Count the number of requests in the time window in which the time of each request is pushed forward by 1 second, and the data before 1 second can be deleted.

If the statistical number of requests is less than the threshold, the time of the request is recorded and allowed to pass, otherwise the request is rejected.

Although it looks OK, the sliding window can not solve the impact of concentrated traffic in a short period of time. For example, there is a limit of 1000 requests per second, but there may be situations where the threshold is full in the first 5 milliseconds, ideally 100 requests every 10 milliseconds, then the system will handle the traffic more smoothly.

But in real-world scenarios, it is difficult to control the frequency of requests. So in order to solve the pain point of the time window algorithm, the leaky bucket algorithm appears again.

The basic idea of the leaky bucket algorithm is that the traffic continues to enter the leaky bucket, and the request is processed at a constant speed at the bottom. If the rate at which the traffic enters is higher than the rate at which the request at the bottom is processed, and when the flow in the bucket exceeds the size of the bucket, the traffic will overflow. It is shown in the following figure:

The leaky bucket algorithm is characterized by wide in and strict out, and the processing speed at the bottom is uniform no matter how fast the request is. The characteristics of this algorithm are somewhat similar to the message queue processing mechanism, generally speaking, the leaky bucket algorithm is also implemented by the queue.

However, this characteristic of leaky bucket algorithm is actually both its advantages and disadvantages. Sometimes in the face of sudden traffic, we often hope that while keeping the system stable, we can process user requests more quickly to improve the user experience, rather than step-by-step Buddhist work. In this case, there is a current-limiting algorithm such as token bucket, which can be more aggressive than leaky bucket algorithm in dealing with sudden traffic.

The principle of the token bucket is similar to that of the leaky bucket, except that the leaky bucket is processed at a uniform speed at the bottom, while the token bucket is stuffed into the bucket at a fixed speed, and the request will not be processed by the server until the token is obtained. The specific rules are as follows:

Put tokens into the bucket at a fixed speed

If the number of tokens exceeds the bucket limit, it will be discarded.

When the request comes, first ask for a token from the bucket, and if the request succeeds, it will be processed, otherwise it will be rejected.

It can be seen that when dealing with sudden traffic, the token bucket will not be handled at a uniform speed like the leaky bucket, but the request can take the token from the bucket at the same time in a short time and be processed by the server in a timely manner. Therefore, in the scenario of dealing with sudden traffic, the token bucket performs better.

Summary of current limiting algorithm

After the above description, it seems that leaky buckets and token buckets are much better than time window algorithms, so time window algorithms are useless.

In fact, it is not. Although leaky bucket and token bucket are better at shaping traffic than time window algorithms, they also have their own shortcomings, such as token buckets. If the system is not preheated when the system is online, then the request may be mistakenly killed because there is no token in the bucket at this time. Because the request in the leaky bucket is temporarily stored in the bucket, there is a delay when the request can be processed, which does not meet the requirements of low latency for Internet services.

Therefore, token bucket and leaky bucket algorithms are more suitable for blocking current-limiting scenarios, that is, current-limiting of background tasks. On the other hand, the current limit based on time window is more suitable for the Internet to implement business current limit, that is, it can handle fast processing, can not handle timely response to callers, and avoid excessive waiting time for requests.

Micro-service current limiting module

If you are interested, you can actually implement a current-limiting component on your own, but the wheel is already man-made. At present, the most popular current-limiting components in the market are: the current-limiting tool class "RateLimiter" provided by Google Guava, and Ali's open source Sentinel.

Among them, the current-limiting tool class "RateLimiter" provided by Google Guava is based on token bucket and extends the algorithm to support preheating function.

At this point, the study on "how to understand the current-limiting logic and algorithm in micro-services" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.