How to limit current in High concurrency system 04/13 Update SLTechnology News&Howtos

How to limit current in High concurrency system

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to do the current limit in the high concurrency system". In the daily operation, I believe that many people have doubts about how to do the current limit in the high concurrency system. The editor consulted all kinds of data and sorted out the simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "how to do the current limit in the high concurrency system"! Next, please follow the editor to study!

Caching

Caching is easy to understand. In large-scale high concurrency systems, if there is no cache database, the database will be exploded every minute, and the system will be paralyzed instantly.

The use of cache can not only improve the system access speed and concurrent access, but also an effective way to protect the database and the system.

Large websites are generally mainly "read", and the use of caching is easy to think of. Caching also often plays a very important role in large "write" systems.

For example, accumulating some bulk writes of data, the cache queue in memory (production and consumption), and the mechanism of HBase writing data are all through cache to improve the throughput of the system or to achieve system protection measures. Even message middleware, you can think of as a distributed data cache.

Downgrade

Service degradation is a strategic degradation of some services and pages according to the current business situation and traffic when the pressure on the server increases sharply, so as to release server resources to ensure the normal operation of core tasks.

Demotion often assigns different levels and performs different processing in the face of different exception levels.

According to the mode of service: service can be rejected, service can be delayed, and sometimes random service can be served. According to the scope of service: you can cut off a function, or you can cut off some modules.

In short, service degradation requires different downgrade strategies according to different business needs. The main purpose is that the service is damaged but better than nothing.

Current limit

Current restriction can be regarded as a kind of service degradation, which is to limit the input and output flow of the system to achieve the purpose of protecting the system.

Generally speaking, the throughput of the system can be calculated. in order to ensure the stable operation of the system, once the threshold that needs to be limited is reached, it is necessary to limit the traffic and take some measures to achieve the purpose of limiting the traffic. For example: delay processing, reject processing, or partial reject processing, and so on.

Current limiting algorithm

Common current-limiting algorithms are counter, leaky bucket and token bucket algorithm.

Counter

The counter is the simplest and roughest algorithm.

For example, a service can only process up to 100 requests per second. We can set a 1-second sliding window with 10 squares, each grid 100 milliseconds, moving every 100 milliseconds, and each move needs to record the number of current service requests.

The number of times you need to save 10 times in memory. It can be implemented using the data structure LinkedList. Each time the grid moves, determine whether the difference between the current number of visits and the last one in the LinkedList is more than 100. if it exceeds, you need to limit the current flow.

Obviously, the more grids of the sliding window are divided, the smoother the rolling of the sliding window will be, and the more accurate the current limit statistics will be.

The sample code is as follows:

/ / the number of service visits, which can be placed in Redis to achieve the access count of the distributed system Long counter = 0L; / / use LinkedList to record 10 squares of the sliding window. LinkedList ll = new LinkedList (); public static void main (String [] args) {Counter counter = new Counter (); counter.doCheck ();} private void doCheck () {while (true) {ll.addLast (counter); if (ll.size () > 10) {ll.removeFirst () } / / compare the last one with the first one, and the difference between them is one second if ((ll.peekLast ()-ll.peekFirst ()) > 100) {/ / To limit rate} Thread.sleep (100);}}

Leaky bucket algorithm leaky bucket is a very commonly used current limiting algorithm, which can be used to realize flow shaping (Traffic Shaping) and flow control (Traffic Policing).

Posted a diagram on Wikipedia to help you understand:

The main concepts of the leaky bucket algorithm are as follows:

A leaky bucket with a fixed capacity that flows out of droplets at a constant rate

If the bucket is empty, there is no need for droplets.

It can flow into the leaky bucket at any rate.

If the inflow droplets exceed the capacity of the bucket, the inflow droplets overflow (discarded), while the capacity of the leaky bucket remains the same.

Leaky bucket algorithm is easy to implement, can be implemented in a stand-alone system using queues (.net TPL DataFlow can better handle similar problems, you can find the relevant introduction here), message middleware or Redis are optional in distributed environment.

Token bucket algorithm

The token bucket algorithm is a bucket that stores fixed capacity tokens (token) and adds tokens to the bucket at a fixed rate.

The token bucket algorithm can basically be described by the following concepts:

Tokens are placed in the token bucket at a fixed rate. Let's say 10 per second.

A maximum of b tokens are stored in the bucket, and when the bucket is full, the newly added tokens are discarded or rejected.

When an n-byte packet arrives, n tokens are removed from the bucket and the packet is sent to the network.

If there are less than n tokens in the bucket, the tokens will not be deleted, and the packet will be restricted (either discarded or buffered).

As shown below:

The token algorithm controls the rate of output according to the rate at which tokens are placed, that is, the rate of to network shown above. To network can be understood as a message handler, executing a section of business or calling a RPC.

Comparison between leaky bucket and token bucket

The token bucket can control and adjust the rate of data processing at run time to handle burst traffic at a certain time.

Increasing the frequency of playing tokens can improve the speed of overall data processing, while increasing or slowing down the issuing speed of tokens and reducing the overall data processing speed by increasing the number of tokens each time. The leaky bucket is not good because its outflow rate is fixed and the processing speed of the program is also fixed.

Overall, the token bucket algorithm is better, but the implementation is more complex.

Implementation of current limiting algorithm

Guava

Guava is an open source Google project that contains several core libraries that are widely relied on by Google's Java project. RateLimiter provides the implementation of token bucket algorithm: smooth burst limiting (SmoothBursty) and smooth preheating limiting (SmoothWarmingUp).

1. General rate:

Create a current limiter and set the number of tokens placed per second: 2. The returned RateLimiter object ensures that no more than 2 tokens will be given within 1 second and is placed at a fixed rate.

Achieve the smooth output effect

Public void test () {/ * create a current limiter and set the number of tokens to be placed per second: 2. The rate is 2 messages per second. * the returned RateLimiter object can guarantee that no more than 2 tokens will be given within 1 second, and it will be placed at a fixed rate. * achieve the effect of smooth output * / RateLimiter r = RateLimiter.create (2); while (true) {/ * acquire () gets a token and returns the time it takes to get the token. * if there is no token in the bucket, wait until there is a token. * acquire (N) can obtain multiple tokens. * / System.out.println (r.acquire ());}

The result of the execution of the above code is shown in the following figure, which is basically 0.5 seconds a data. After getting the token, you can process the data to achieve the smooth effect of outputting data or calling the interface.

The return value of acquire () is the time it takes to wait for a token. If you need to deal with some burst of traffic, you can set a threshold for this return value and handle it according to different circumstances, such as expired discarding.

two。 Burst traffic:

Burst traffic can be more burst, or less burst. First of all, let's take a look at a sudden example. Or the traffic in the above example, 2 data tokens per second. The following code uses the acquire method to specify parameters.

System.out.println (r.acquire (2)); System.out.println (r.acquire (1)); System.out.println (r.acquire (1)); System.out.println (r.acquire (1))

You get an output similar to the following.

If you want to process more data at a new time, you need more tokens. The code first gets two tokens, so the next token is not obtained in 0.5 seconds, or 1 second later, and then returns to normal speed.

This is an example of multiple bursts. If there is no traffic in a burst, the following code:

System.out.println (r.acquire (1)); Thread.sleep (2000); System.out.println (r.acquire (1)); System.out.println (r.acquire (1)); System.out.println (r.acquire (1))

Similar results are obtained as follows:

After waiting for two seconds, three tokens have been accumulated in the token bucket, which can be obtained continuously without taking time. In fact, dealing with emergencies means that the output is constant per unit time.

Both of these methods are used in SmoothBursty, a subclass of RateLimiter. Another subclass is SmoothWarmingUp, which provides a buffered traffic output scheme.

Create a current limiter and set the number of tokens to be placed per second: 2. The rate is 210 messages per second. * the returned RateLimiter object can guarantee that no more than 2 tokens will be given within 1 second, and it will be placed at a fixed rate. * achieve the effect of smooth output * set the buffer time to 3 seconds * / RateLimiter r = RateLimiter.create; while (true) {/ * acquire () gets a token and returns the time it takes to get the token. * if there is no token in the bucket, wait until there is a token. * acquire (N) can obtain multiple tokens. * / System.out.println (r.acquire (1));}

The output result is shown in the following figure. Because the buffer time is set to 3 seconds, the token bucket does not give a message in 0.5 seconds at the beginning, but forms a smooth linear decline slope, and the frequency is getting higher and higher. It reaches the originally set frequency within 3 seconds, and then outputs at a fixed frequency.

The three times circled by red lines in the picture add up to about 3 seconds. This function is suitable for scenarios where the system just starts up and takes a little time to "warm up".

Nginx

For Nginx access layer current limit, you can use Nginx with two modules:

Connect the current limiting module ngx_http_limit_conn_module

Request current limiting Module ngx_http_limit_req_module implemented by leaky Bucket algorithm

1. Ngx_http_limit_conn_module

We often encounter this situation, such as abnormal server traffic, overload and so on.

For malicious attack access with large traffic, it will bring waste of bandwidth, server pressure, and affect business, so it is often considered to limit the number of connections and concurrency of the same ip. Ngx_http_limit_conn_module module to implement this requirement.

The module can limit the number of connections per key value according to the defined keys, just like the number of connections from an IP source.

Not all connections are counted by the module, only those requests that are being processed (the header information of these requests has been fully read) are counted.

We can add the following configuration implementation restrictions to the http {} of nginx_conf:

# limit the number of concurrent connections per user, and use the name one limit_conn_zone $binary_remote_addr zone=one:10m; # to configure the log level after the flow limit. The default error level limit_conn_log_level error; # configures the status code returned after the flow limit. By default, 503 limit_conn_status 503 is returned.

Then add the following code to server {}:

# limit the number of concurrent connections to 1 limit_conn one 1

Then we use the ab test to simulate concurrent requests: ab-n 5-c 5 http://10.23.22.239/index.html

When you get the following results, it is obvious that concurrency is limited, and 503 is shown if the threshold is exceeded:

In addition, whether to configure the concurrency limit for a single IP or the domain name, which is similar to the client IP.

# http {} segment configuration limit_conn_zone $server_name zone=perserver:10m; # server {} segment configuration limit_conn perserver 1

2. Ngx_http_limit_req_module

Above we use the ngx_http_limit_conn_module module to limit the number of connections. So what about the limit on the number of requests?

This needs to be achieved through the ngx_http_limit_req_module module, which can limit the frequency of request processing by defining key values.

In particular, you can limit the frequency of request processing from a single IP address. The method of limitation is to use the funnel algorithm, which processes a fixed number of requests per second and delays too many requests.

If the frequency of requests exceeds the limit domain configuration value, request processing is delayed or discarded, so all requests are processed at a defined frequency.

Configure in http {}

# the region name is one, the size is 10m, and the average request frequency cannot be processed more than once per second. Limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s

Configure in server {}

# set the number of IP buckets to 5 limit_req zone=one burst=5

The above setting defines that the request processing per IP is limited to 1 per second. And the server can cache 5 requests for each IP. If 5 requests are operated, the request will be discarded.

Use the ab test simulation client to access 10 consecutive times:

Ab-n 10-c 10 http://10.23.22.239/index.html

As shown in the following figure, the number of passes is set to 5. There are 10 requests, and the first request is processed immediately. No. 2-6 are stored in buckets. Because the bucket is full, no nodelay is set, so the remaining four requests are discarded.

At this point, the study on "how to limit the current in high concurrency systems" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.