How to use the algorithm of current limiter in distributed system in web Development 07/06 Update SLTechnology News&Howtos

How to use the algorithm of current limiter in distributed system in web Development

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article is about how to use the current limiter algorithm in the distributed system in web development. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.

The general current limiter has five algorithms, namely: token bucket, funnel bucket, fixed window, sliding log (actually refers to the sliding window in the broad sense), sliding window (here refers to a combination of sliding log + fixed window algorithm).

Token bucket (Token bucket)

The token bucket algorithm is used to control the number of data sent to the network over a period of time and to allow burst data to be sent.

The algorithm is roughly as follows: assuming that the allowed request rate is r times per second, then a token will be added to the bucket every 1 max r seconds. The maximum size of the bucket is b. When a request of size n arrives, check whether the number of tokens in the bucket is sufficient, if so, reduce the number of tokens by n, and the request is passed. If it's not enough, it triggers the rejection strategy.

The token bucket has a fixed size, assuming that each request also has a size, and when you want to check whether the request meets the defined limits, the bucket is checked to determine if it contains enough tokens at that time. If so, the tokens are removed and the request is passed. Otherwise, other actions will be taken, usually rejected. The tokens in the token bucket are restored at a certain rate, which is the rate at which requests are allowed (of course, depending on the size configuration, it may actually exceed this rate, but it will be adjusted back to this recovery rate as the token bucket is consumed).

If the token is not consumed, or if the token is consumed less than the speed at which it is generated, the token will continue to increase until the bucket is filled. It can be seen that the token bucket allows a certain degree of burst transmission while maintaining the overall request rate.

The implementation of token bucket in distributed environment needs to consider the following issues:

How exactly is the current size of the token bucket stored? Do you store only the size of a current token bucket (for example, through a key-value pair of redis), or do you store a timestamp for the arrival of each passed request (for example, through redis's zset implementation, the size of the zset is the maximum size of the bucket)?

Who complements the token replenishment of the token bucket? For the implementation of storing the size of a current token bucket, a process is required to constantly add tokens to it at a rate of r, so how to ensure that there is only one such process in a distributed environment, and what if this process fails? For the implementation of storing the timestamp of the arrival of each passed request, then how to control the number of record requests, certainly not all of them, and how to judge the number of remaining tokens by the current request and timestamp each time

Funnel bucket (Leaky bucket)

Funnel bucket control requests must be consumed at a maximum rate, just like a funnel, the amount of water input can be large or small, but the maximum rate can only reach a certain value, unlike token buckets, there will be small spikes.

The algorithm is roughly as follows: the main implementation is through a FIFO (First in first out) queue, which is a bounded queue of size b, and if the request is piled up with the queue, the discarding policy will be triggered. Assuming that the allowed request rate is r times per second, then the requests in this queue will be consumed at that rate.

The implementation of leaky bucket in distributed environment needs to consider the following problems:

How to store the queue of leaky buckets? This queue needs to store the timestamp of each passed request and the corresponding consumption to ensure the stability of consumption. At the same time, this queue is preferably unlocked, because there will be distributed lock requisition. Also, the queue size should be set to b, and every time a request arrives, it should be placed in the queue and cleaned up at the same time.

How can consumption be realized? That is, how can requests stored in the queue be consumed? When the request arrives, you can judge how long the current request should be executed by the request in the queue, and then join the queue and delay the execution of the request for such a long time. It can also be implemented by using its own queue with delay time.

Fixed time window (Fixed window)

The fixed time window is relatively simple, that is, the time is divided into several time slices, and each time slice is fixed to process several requests. This kind of implementation is not very rigorous, but because the implementation is simple, it is suitable for some less demanding scenarios. The algorithm is roughly as follows: assuming that a maximum of b requests are processed in n seconds, the counter is reset to b every n seconds. When the request arrives, if the counter value is sufficient, the deduction is made and the request is passed, and the reject policy is triggered if not enough.

Fixed time window is the easiest algorithm to implement, but it also has obvious defects: in many cases, especially when the rejection policy is queued after the request is limited, requests are quickly consumed at the beginning of the time window. it is not advisable to spend the rest of the time not processing any requests. And, in some limit cases, the actual flow speed may be up to twice the current limit. For example, limit a maximum of 100 requests in one second. Suppose 100 requests arrive in 0.99 seconds, and then 100 requests arrive in 1.01 seconds, so there are 200 requests in 0.99 seconds to 1.01 seconds, which is not strictly speaking, there are only 100 requests per second. In order to realize the request flow limit in the strict sense, there are the latter two algorithms.

Slide log (Sliding Log)

The sliding log is calculated according to the corresponding timestamp of the accepted request before the cache and the timestamp of the current request, and the rate is controlled. This strictly limits the request rate. The general sliding window algorithm mentioned on the Internet also refers to the sliding log (Sliding Log) algorithm here, but our sliding window here is another optimized algorithm, which will be mentioned later.

The algorithm is roughly as follows: suppose a maximum of b requests are processed in n seconds. Then a maximum of b passed requests and corresponding timestamps will be cached, assuming that the cache set is B. Whenever a request arrives, delete all requests before n seconds from B to see if the collection is full. If not, pass the request and put it into the collection. If it is full, trigger the rejection policy.

The implementation of sliding log in distributed environment needs to consider the following issues:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Our algorithm actually simplifies storage, but for high concurrency scenarios, there may be a large number of requests to be cached (for example, if there is a limit of 100,000 requests per second, should the size of this cache be able to store 100,000 requests?). How should this cache be implemented?

In high concurrency scenarios, this operation for all requests before n seconds before the collection is deleted needs to be very fast. If your cache collection implementation is slow to delete by timestamp, you can cache a little more requests and regularly clean up all requests before n seconds of deletion instead of deleting them every time they arrive. When the request arrives, check whether the previous b requests exist and the time difference is less than n seconds, which means that the current limit policy should be triggered.

Sliding window (sliding log + fixed window)

In the previous slide log, we mentioned a problem-there may be a lot of requests to cache. Maybe we can't use a proper cache within our architecture, we can use the sliding window method to reduce the number of requests to be stored and reduce the set size to reduce the concurrency on the same collection.

The algorithm is roughly as follows: suppose a maximum of b requests are processed in n seconds. We can split n seconds into time slices with each size of m milliseconds, with only the latest cache requests and timestamps in the time slices, and only one number of requests in the previous time slices. In this way, the storage can be greatly optimized and the amount of computation can be increased slightly. For the critical condition, that is, there are already nga time slices, and the percentage of elapsed time in the current time slice can be calculated when calculating the number of requests within n seconds. Assuming that it is 25%, then 75% of the requested amount of the first time slice is calculated.

The above is how to use the current limiter algorithm in the distributed system in the development of web. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.