The focus of distributed system-- want to pass customs and "limit current"? Just this one. 07/03 Update SLTechnology News&Howtos

The focus of distributed system-- want to pass customs and "limit current"? Just this one.

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

If this is the second time you have seen my article, you are welcome to subscribe to me at the end of the article.

The length of this article is 2869 words. It is recommended to read for 8 minutes.

You may have read a lot of articles about "current restriction" on the Internet, but this one by Brother Z is probably the most comprehensive and simple one (let me float for a few seconds).

I'm just kidding. I hope you can get some incremental value.

It has been learned that some of Brother Z's readers do not fully understand the relationship between "current limit" and "circuit breaker". Let's first think about a question, there are also flow restrictions in life, why should the popular scenic spots during the National Day and Spring Festival holiday limit the flow? Instead of opening for a few hours in the morning, if there are more people, it will be closed for a few hours, and if there are fewer people, it will be opened again. In fact, this is a difference between the appearance of current limiting and fusing.

In the previous article, we talked about "circuit breakers" (distributed system concerns-"circuit breakers" and best practices that 99% of people can understand), systems with circuit breakers that at least ensure that usability does not collapse altogether.

But you can imagine a slightly more extreme scenario, if the system flow is not very stable, causing frequent circuit breakers, does it mean that the system is constantly switching between the three states of the circuit breaker?

▲ Click on the picture to see a larger picture

The result is that every time from turning on the circuit breaker to closing the circuit breaker, it will inevitably lead to a large number of users can not use it normally. The availability at the system level goes something like this.

In addition, from the utilization rate of resources, it is also easy to find that the resources in this period of trough are not fully utilized.

Thus it can be seen that it is not enough to have a fuse.

Under high pressure, as long as the system is not down, it will be a more efficient mode of operation if the received traffic can be kept at a high level without exceeding the upper limit of the system capacity, because it will fill the trough here.

In today's environment where the Internet has been used as a social infrastructure, the above scene is not that far away from us, and it will not be so extreme. For example, endless marketing games, one social hot spot after another, as well as the vigorous development of the underground industry and brushes under the Internet iceberg, make this scene even more need to be considered and scrupulous. Because it is possible to pour in more traffic than you expected at any time and crush your system.

Then the role of current restriction is very obvious: as long as the system is not down, the system is simply unable to cope with a large number of requests because of insufficient resources, in order to ensure that the limited system resources can provide maximum service capacity, therefore, a method to limit the traffic (output or input) of the system according to the preset rules to ensure that the received traffic will not exceed the upper limit of the capacity of the system.

First, how to do "current restriction"

From what we talked about earlier, we also know that current limitation is best "limited" near the upper limit of the processing capacity of a system, so:

The first step is to obtain the upper limit of the system's capability through "stress testing" and so on.

The second is to formulate strategies to intervene in traffic flow. For example, how to set the standard, whether only pay attention to the results or also pay attention to the smoothness of the process, and so on.

Finally, it is to deal with "interfered" traffic. Can you just throw it away? What should I do if I can't?

Get the upper limit of the system's capability

The first step is not the focus of our content this time, which is to do a round of stress tests on the system. It can be carried out in an independent environment, or you can directly select a node as a sample from multiple nodes in the production environment. Of course, it needs to be isolated from other nodes.

In general, we do pressure tests in order to get two results, "rate" and "concurrency". The former represents the number of requests that can be processed in a unit of time, such as xxx requests per second. The latter represents the maximum number of requests that the system can handle at the same time, such as the concurrency of xxx times. You need to get a "maximum", "average" or "median" from the index. The specific standard values that need to be set in the follow-up current restriction strategy are derived from these indicators.

Digression: from the perspective of Excelsior, other factors such as cpu, network bandwidth and memory consumption can also be used as reference factors.

Develop strategies to intervene in traffic

There are four commonly used strategies, and I give it a simple definition-"two windows and two buckets". Two windows are: fixed window, sliding window, two buckets are: leaky bucket, token bucket.

Fixed window

A fixed window defines a "fixed" statistical period, such as 1 minute or 30 seconds, 10 seconds. Then count the number of requests received in the current cycle in each cycle, and trigger "traffic intervention" if the set threshold is reached after counter accumulation. Until the next cycle, the counter is cleared and the traffic reception returns to normal.

This strategy is the simplest, with only a few lines of code to write.

The global variable int totalCount = 0; / / there is a timer that will be triggered by a "fixed period" to zero the value.

If (totalCount > current limit threshold) {

Return; / / does not continue processing the request.

}

TotalCount++

/ / do something...

One thing to note about the fixed window is that if the entry of the request is very concentrated, then the "current limit threshold" set is equal to the maximum number of concurrency you need to bear. So, if you need to worry about concurrency, the "fixed cycle" here should be as short as possible. Because, in this way, the value of the "current limiting threshold" can be reduced accordingly. Even, the current limiting threshold can be directly specified by the number of concurrency. For example, if the fixed period is 3 seconds, then the threshold here can be set to "average concurrency * 3".

However, no matter how it is set, the disadvantage of the fixed window is that since the entry of traffic is often not a constant value, once the entry speed of traffic fluctuates, either the counter will be full in advance. as a result, requests for the rest of the cycle are "limited". Either the counter is dissatisfied, that is, the "current limit threshold" is set too large, resulting in the resources can not be fully utilized.

"sliding window" can improve this problem.

Sliding window

The sliding window is actually a further subdivision of the fixed window, cutting the original grain size more finely, for example, the fixed window of 1 minute is divided into 60 sliding windows of 1 second. Then the statistical time range moves back synchronously with the passage of time.

At the same time, we can also come to the conclusion that if the "fixed period" of a fixed window is already very small, then there is no point in using a sliding window. For example, the current fixed window cycle is already 1 second, and then split to the millisecond level can outweigh the gain, which will lead to huge performance and resource consumption.

The general code logic of the sliding window is as follows:

Global array linked list [] counterList = new linked list [number of sliding windows split]

/ / there is a timer that removes the element at index 0 each time the starting point of the statistical period needs to change, and appends a new element to the end.

Int sum = counterList.Sum ()

If (sum > current limit threshold) {

Return; / / does not continue processing the request.

}

Int current index = seconds of current time% number of split sliding windows

CounterList [current Index] + +

/ / do something...

Although sliding windows can improve the problem, it is essentially a way of pre-defining time slices, which is a kind of "prediction", which means that 100% of the things will almost certainly not be put to good use.

However, the bucket mode can do better because there is an extra buffer (the bucket itself) in the bucket mode.

Leaky bucket

First of all, let's talk about the leaky bucket. The core of the leaky bucket mode is to fix the rate of "exit", no matter how much it comes in, the rate of going out is always so high. If the influx is too large to hold the barrel, then there is a "flow intervention".

Let's break down the whole implementation process.

Control the rate of outflow. This can actually be achieved using the two "windows" mentioned earlier. If the current rate is less than the threshold, the request is processed directly, otherwise the request is not processed directly, enters the buffer, and increases the current water level.

The implementation of the buffer can do a brief hibernation or record it in a container and then retry asynchronously.

Finally, the water level in the control bucket does not exceed the maximum water level. This is very simple, it is a global counter to add and subtract.

In this way, you will find that the essence is to smooth the unsmooth traffic through a buffer (the above-average traffic is temporarily stored to make up for the below-average period), so as to maximize the utilization of computational processing resources.

The simplified representation of the implementation code is as follows:

The current outflow rate of the global variable int unitSpeed; / / exit. Every other rate calculation cycle (for example, 1 second) triggers a timer to zero the value.

The water level line of the global variable int waterLevel; / / current buffer.

If (unitSpeed

< 速率阈值) { unitSpeed++; //do something... } else{ if(waterLevel >

Water level threshold) {

Return; / / does not continue processing the request.

}

WaterLevel++

While (unitSpeed > = rate threshold) {

Sleep (for a short time).

}

UnitSpeed++

WaterLevel--

/ / do something...

}

A better "leaky bucket" strategy can already achieve the 100% processing capacity you expect with sufficient traffic, but this is not the ultimate.

You should know that in the running environment in which a program is running, there are not only the program itself, but also some system processes and even other user processes. In other words, the processing power of the program itself will be disturbed and will change. So, you can estimate the average and median at a certain stage, but you can't predict the processing power of the program at a specific time. Therefore, you are bound to use a relatively pessimistic standard as a threshold to prevent the program from being overloaded.

So in terms of resource utilization, is there a better plan? Yes, this is the token bucket.

Token bucket

The core of the token bucket mode is to fix the "import" rate. Get the token first, and then process the request. If you don't get the token, you will be "interfered with by traffic". Therefore, when a large amount of traffic enters, as long as the token generation speed is greater than or equal to the speed at which the request is processed, then the program processing capacity is the limit.

Let's also decompose its implementation process.

Controls the rate at which tokens are generated and placed in a bucket. This is actually a single thread constantly generating tokens.

The water level of the token to be collected in the control bucket does not exceed the maximum water level. This, like the "leaky bucket", is a global counter that adds and subtracts.

The rough code simplification is shown as follows (looks like the reverse logic of a "fixed window"):

Global variable int tokenCount = threshold for the number of tokens; / / number of tokens available. A separate thread increases this number with a fixed frequency, but not greater than the token threshold.

If (tokenCount = = 0) {

Return; / / does not continue processing the request.

}

TokenCount--

/ / do something...

As smart as you might think, the capacity of the token bucket is theoretically the maximum number of concurrency that the program needs to support. Indeed, it is assumed that the token will be taken out of the incoming traffic at the same time, but the program does not have time to handle it, which will lead to an accident.

So, there is no really perfect strategy, only the right one. Therefore, it is more necessary to exercise the ability to identify what is the most appropriate strategy according to different scenarios. Next, Brother z shares some of my personal experience.

Second, how to choose the four strategies for the best practice of "current restriction"?

First, dock the window. Generally speaking, if time is not tight, it is too blunt to choose this option. However, in order to quickly stop the loss of the immediate problem can be used as a temporary emergency plan.

Second, slide the window. This scheme is suitable for scenarios where there is "high tolerance" for abnormal results, after all, there is one less buffer than the "two windows". However, the victory lies in the simple implementation.

Then, the bucket leaks. Brother Z thinks this scheme is the most suitable as a general scheme. Although the utilization rate of resources is not extreme, the idea of "wide in and strict out" leaves some leeway while protecting the system, which makes it more applicable to a wider range of scenarios.

Finally, the token bucket. When you need to squeeze the performance of the program as much as possible (at this time, the maximum capacity of the bucket must be greater than or equal to the maximum concurrency of the program), and the traffic in the scenario does not fluctuate very much (so as not to take the token in an instant. Crush the back-end system).

New challenges in distributed systems

A mature distributed system looks something like this.

Each upstream system can be understood as the client of its downstream system. Then we think back to the previous content, you may find that the "current limit" mentioned above does not mention whether to do it on the client side or on the server side, or even seems to be based on the server side. But you know, in a distributed system, there may be multiple copies of a server itself, and it will be provided to multiple client calls, and even it will act as a client role. So, in such a staggered and complex environment, how to start to limit the current? My idea is to consider it through "one vertical and one horizontal".

Vertical

We all know that "current restriction" is a protective measure, so we can think of it as a shield. In addition, the processing of a request in the system is chained. So, just as the army fought in ancient times, except for a small number of shield soldiers who were protected around the boss, the rest were at the forefront. Because the more advanced the position of the shield, the greater the range of benefits.

What is at the forefront of a distributed system? Access layer. If your system has an access layer, such as a reverse proxy using nginx. Then you can limit the current through its ngx_http_limit_conn_module and ngx_http_limit_req_module, which is a very mature solution.

If there is no access layer, then it can only be done in the application layer with the idea of AOP. However, because the application is decentralized, you need to limit the current specifically for the sake of cost. For example, the application of ToC must need to be done more than the application of ToB, the high-frequency cache system must need to be done more than the low-frequency report system, and the Web application must be more convenient than the Service application because of the Filter mechanism.

So in the end, is the flow restriction between applications client or server?

Brother Z's view is that the client mode is definitely better than the server mode in effect, because when it is in a current-limited state, the client mode saves even the action of establishing a connection. Another potential benefit is that compared with the centralized server-side model, the pressure on a small number of server-side programs can be dispersed. But it is also more expensive to do on the client side because it is decentralized, which is troublesome if data sharing between multiple nodes is needed.

So, in the end, Brother Z suggests to you: if you consider the cost on the server-side model, consider the effect on the client-side model. Of course, it is not absolute. For example, if most of the traffic on a server comes from a client, then you can limit the flow directly on this client, which is a good solution.

At the database level, the general connection string itself will contain the concept of "maximum number of connections", which can play a role in limiting the flow. If you want to do more fine-grained control, you can only achieve a unified encapsulation of the database access layer framework.

After chatting, what is "horizontal"?

Transverse

Whether it is multiple clients or multiple copies of the same server. The performance of each node is bound to be different, how to set the appropriate threshold? And how to make policy changes take effect on multiple nodes in the cluster as quickly as possible? It is very simple to introduce a performance monitoring platform and configuration center. But it is really not easy to do this, and we will expand this piece of content later.

III. Summary

Current limit is like a fuse. According to the standard you set, pull the brake when you reach it.

However, in addition to directly discarding the request, another way to trigger the current limit is to "downgrade", so what are the ways of downgrading? Let's talk about it in the next article.

Question:

Have you ever encountered any scenes in your work that need to be "limited"? Welcome to share and exchange.

Distributed system concerns-"circuit breakers" and best practices that 99% of people can understand

Distributed system concerns-only this article is needed to get through the "load balancing" properly.

Author: Zachary (personal WeChat account: Zachary-ZF)

Wechat official account (launch): cross-border architect.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.