Current limiting Technology of Linux High concurrency system 04/16 Update SLTechnology News&Howtos

Current limiting Technology of Linux High concurrency system

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

There are three sharp tools to protect the system when developing high concurrency systems: cache, downgrade and current limit. The purpose of the cache is to improve the access speed of the system and increase the capacity that the system can handle, which can be described as a silver bullet against high concurrent traffic, while the downgrade is that when there is a problem with the service or affects the performance of the core process, it needs to be temporarily shielded and opened after the peak or problem is solved. However, some scenarios cannot be solved by caching and demotion, such as scarce resources (second kill, rush purchase), write services (such as comments, placing orders), and frequent complex queries (the last few pages of comments). Therefore, there needs to be a means to limit the amount of concurrency / requests in these scenarios, that is, flow restriction.

The purpose of current restriction is to protect the system by imposing speed limits on concurrent visits / requests or requests within a time window. Once the speed limit is reached, you can deny service (direct to an error page or tell you that the resource is gone), queue or wait (such as seconds kill, comment, place an order), and downgrade (return pocket data or default data, such as product details page inventory is available by default).

Common current limits for developing high concurrency systems are: limit the total number of concurrency (such as database connection pool, thread pool), limit the number of instantaneous concurrency (such as nginx's limit_conn module, which is used to limit the number of instantaneous concurrent connections), limit the average rate in the time window (such as Guava's RateLimiter, nginx's limit_req module, limit the average rate per second); others, such as limiting the remote interface call rate, limiting the MQ consumption rate. In addition, the flow can be limited according to the number of network connections, network traffic, CPU or memory load, etc.

The silver bullet of caching first, and then limited flow to deal with 618 and Singles Day high concurrent traffic, can be said to be like a tiger in dealing with high concurrency problems. You don't have to worry about instant traffic causing the system to die or avalanche, resulting in damage to service rather than non-service. Current restriction needs to be well evaluated and must not be misused, otherwise there will be some strange problems in normal traffic that will lead to user complaints.

Don't worry too much about the algorithm in practical application, because the implementation of some current-limiting algorithms is the same, but the description is different; which current-limiting technology to use should be chosen according to the actual scene, and don't blindly find the best mode. The white cat and black cat can solve the problem is a good cat.

Because many people have come to ask how to limit the current in the actual work, this paper will introduce various current limiting methods in detail. Then we learn the lower current limit technology in detail from current limiting algorithm, application level current limiting, distributed current limiting and access layer current limiting.

Current limiting algorithm

The common current-limiting algorithms are token bucket and leaky bucket. The counter can also be implemented by rough current limit.

Token bucket algorithm

The token bucket algorithm is a bucket that stores fixed capacity tokens and adds tokens to the bucket at a fixed rate. The token bucket algorithm is described as follows:

Assuming that 2r/s is limited, tokens are added to the bucket at a fixed rate of 500ms

A maximum of b tokens are stored in the bucket. When the bucket is full, the newly added tokens are discarded or rejected.

When an n-byte packet arrives, n tokens are removed from the bucket and the packet is sent to the network

If there are less than n tokens in the bucket, the tokens will not be deleted, and the packet will be restricted (either discarded or buffered).

Leaky bucket algorithm

When a leaky bucket is used as a measurement tool (The Leaky Bucket Algorithm as a Meter), it can be used for flow × × (Traffic Shaping) and flow control (TrafficPolicing). The leaky bucket algorithm is described as follows:

A leaky bucket with a fixed capacity that flows out of droplets at a constant rate

If the bucket is empty, there is no need for droplets.

It can flow into the leaky bucket at any rate.

If the inflow droplets exceed the capacity of the bucket, the inflow droplets overflow (discarded), while the capacity of the leaky bucket remains the same.

Token bucket versus leaky bucket:

The token bucket adds tokens to the bucket at a fixed rate. Whether the request is processed depends on whether the tokens in the bucket are sufficient. When the number of tokens is reduced to 00:00, the new request is rejected.

The leaky bucket flows out the request at a constant fixed rate, and the inflow request rate is arbitrary. When the number of incoming requests accumulates to the leaky bucket capacity, the new inflow request is rejected.

The token bucket limits the average inflow rate (allows burst requests, can be processed as long as there are tokens, supports getting 3 tokens and 4 tokens at a time), and allows a certain degree of burst traffic

The leaky bucket limits the constant outflow rate (that is, the outflow rate is a fixed constant value, for example, the outflow rate is 1, not 1 at one time, and 2 at the next time), thus smoothing the burst inflow rate.

Token buckets allow a certain degree of burst, while leaky buckets are mainly designed to smooth the inflow rate

The implementation of the two algorithms can be the same, but the direction is opposite, and the current limiting effect is the same for the same parameters.

In addition, sometimes we use counters to limit current, which is mainly used to limit the total number of concurrency, such as database connection pool, thread pool, and the number of concurrency of seconds; as long as the threshold set by the total number of global requests or the total number of requests in a certain period of time is limited, it is a simple and rough total number of current limits, rather than the average rate limit.

This is the end of the introduction of the basic algorithm, let's first take a look at the application-level current limit.

Application level current limit

Current limit total concurrency / connections / requests

For an application system, there must be a limit of concurrency / requests, that is, there is always a TPS/QPS threshold. If the threshold is exceeded, the system will not respond to user requests or respond very slowly, so we had better carry out overload protection to prevent a large number of requests from pouring into the system.

If you have used Tomcat, one of its Connector configurations has the following parameters:

AcceptCount: if the threads of the Tomcat are busy responding, the new connection will be queued, and if the queue size is exceeded, the connection will be rejected

MaxConnections: instantaneous maximum number of connections, and those exceeding it will wait in queue.

MaxThreads:Tomcat can start the maximum number of threads used to process requests and may freeze if the number of requests has been much greater than the maximum number of threads.

Please refer to the official documentation for detailed configuration. In addition, Mysql (such as max_connections) and Redis (such as tcp-backlog) will have similar configurations to limit the number of connections.

The total number of limited resources

If some resources are scarce (such as database connections, threads) and may be used by multiple systems, then you need to limit the application; pooling techniques can be used to limit the total number of resources: connection pools, thread pools. For example, if the database connection assigned to each application is 100, you can use up to 100 resources, more than you can wait or throw an exception.

Limit the total number of concurrency / requests for an interface

If there may be sudden access to the interface, but you are worried about the crash caused by too much traffic, such as panic buying business, you need to limit the total number of concurrency / requests for this API. Because the granularity is relatively fine, you can set the corresponding threshold for each interface. You can use AtomicLong in Java to limit current:

{(atomic.incrementAndGet () > current limit) {}} {atomic.decrementAndGet ();}

It is suitable to limit the flow of services that are lossless or require overload protection, such as panic buying business, if the size is exceeded, either let the user queue up, or tell the user that he is out of stock, which is acceptable to the user. Some open platforms also limit the number of trial requests for users to call an interface, which can also be implemented in this way of counter. This method is also a simple and rough current limit, there is no smoothing, and it needs to be chosen according to the actual situation.

Limit the number of time window requests for an interface

That is, the number of requests in a time window, such as the number of requests / calls per second / minute / day of an interface / service. For example, some basic services will be called by many other systems, for example, the commodity details page service will call the basic commodity services, but we are afraid that the basic services will be hung up because of a large number of updates. At this time, we need to limit the speed of calls per second / minute. One implementation method is as follows:

LoadingCache counter = CacheBuilder. () .construcreAfterWrite (, TimeUnit.) .build (CacheLoader () {AtomicLong load (Long seconds) Exception {AtomicLong ();}}); limit =; () {currentSeconds = System. () / (counter.get (currentSeconds) .incrementAndGet () > limit) {System..println (+ currentSeconds);}}

We use Guava's Cache to store counters, and the expiration time is set to 2 seconds (ensure that counters within 1 second are available), and then we get the current timestamp and take the number of seconds as KEY for counting and current limiting. This method is also simple and rough, and the scenario I just mentioned is enough.

Smooth limit the number of requests for an interface

None of the previous current limiting methods can deal with sudden requests well, that is, instant requests may be allowed, which may lead to some problems. Therefore, in some scenarios, sudden requests need to be processed at an average rate (for example, 5r/s, one request is processed every 200ms, smoothing the rate). At this time, there are two algorithms that satisfy our scenario: token bucket and leaky bucket algorithm. The Guava framework provides the implementation of token bucket algorithm, which can be used directly.

Guava RateLimiter provides the implementation of token bucket algorithm: smooth burst current limiting (SmoothBursty) and smooth preheating current limiting (SmoothWarmingUp).

SmoothBursty

RateLimiter limiter = RateLimiter. (); System..println (limiter.acquire ()); System..println (limiter.acquire ())

You will get output similar to the following:

0.0

0.198239

0.196083

0.200609

0.199599

0.19961

1. RateLimiter.create (5) means that the bucket capacity is 5 and 5 tokens are added every second, that is, one token is added every 200ms.

2. Limiter.acquire () means to consume a token. If there are enough tokens in the current bucket, it succeeds (the return value is 0). If there are no tokens in the bucket, pause for a period of time. For example, if the interval between issuing tokens is 200ms, wait 200ms before consuming tokens (as the test case above returns 0.198239, it takes almost 200ms for tokens to be available) This implementation averages the burst request rate to a fixed request rate.

Let's look at another example of a sudden:

RateLimiter limiter = RateLimiter.create (5)

System.out.println (limiter.acquire (5))

System.out.println (limiter.acquire (1))

You will get output similar to the following:

0.0

0.98745

0.183553

0.199909

Limiter.acquire (5) means that the capacity of the bucket is 5 and 5 tokens are added per second. The token bucket algorithm allows a certain degree of burst, so you can consume 5 tokens at a time, but the next limiter.acquire (1) will wait about 1 second in the bucket before you can have a token, and the next request is also a fixed rate.

RateLimiter limiter = RateLimiter.create (5)

System.out.println (limiter.acquire (10))

System.out.println (limiter.acquire (1))

You will get output similar to the following:

0.0

1.997428

0.192273

0.200616

Similar to the above example, 10 requests burst in the first second, and the token bucket algorithm also allows this burst (allowing the consumption of future tokens), but the next limiter.acquire (1) will wait about 2 seconds in the bucket before there can be a token, and the next request is also a fixed rate.

Next, let's take a look at another sudden example:

RateLimiter limiter = RateLimiter.create (2)

System.out.println (limiter.acquire ())

Thread.sleep (2000L)

System.out.println (limiter.acquire ())

You will get output similar to the following:

0.0

0.499876

0.495799

1. Create a bucket with a capacity of 2 and add 2 tokens per second

2. First, call limiter.acquire () to consume a token, and the token bucket can be satisfied (the return value is 0)

3. Then the thread pauses for 2 seconds, the next two limiter.acquire () can consume tokens, the third limiter.acquire () also consumes tokens, and by the fourth it needs to wait 500ms.

You can see that the bucket capacity we set is 2 (that is, the allowable burst). This is because there is a parameter in SmoothBursty: the default value of maximum number of seconds of burst (maxBurstSeconds) is 1s, and burst / bucket capacity = rate * maxBurstSeconds, so the bucket capacity / burst in this example is 2. In this example, the first two are consumed the accumulated burst, and the third is calculated normally. The token bucket algorithm allows tokens that have not been consumed for a period of time to be temporarily stored in the token bucket for future use, and allows this burst of future requests.

SmoothBursty calculates the time of the next token addition based on the average rate and the time of the last token addition, and requires a bucket to hold tokens that are not in use for a period of time (that is, the number of tokens that can burst). In addition, RateLimiter provides a tryAcquire method for non-blocking or timeout token consumption.

Because SmoothBursty allows a certain degree of burst, some people worry that if such a burst is allowed, assuming that a large amount of traffic is suddenly coming, then the system may not be able to withstand this burst. Therefore, a current limiting tool with smooth speed is needed, so that the system slowly tends to the average fixed rate after cold start (that is, the rate is smaller at the beginning, and then tends to the fixed rate we set). Guava also provides SmoothWarmingUp to implement this requirement, which can be thought of as a leaky bucket algorithm, but it is different in some special scenarios.

SmoothWarmingUp creation method: RateLimiter.create (doublepermitsPerSecond, long warmupPeriod, TimeUnit unit)

PermitsPerSecond represents the number of new tokens per second, and warmupPeriod represents the interval between the transition from the cold start rate to the average rate.

Examples are as follows:

RateLimiter limiter = RateLimiter.create (5, 1000, TimeUnit.MILLISECONDS)

For (int I = 1; I

< 5;i++) { System.out.println(limiter.acquire()); } Thread.sleep(1000L); for(int i = 1; i < 5;i++) { System.out.println(limiter.acquire()); } 将得到类似如下的输出： 0.0 0.51767 0.357814 0.21××× 0.199984 0.0 0.360826 0.220166 0.199723 0.199555 速率是梯形上升速率的，也就是说冷启动时会以一个比较大的速率慢慢到平均速率；然后趋于平均速率（梯形下降到平均速率）。可以通过调节warmupPeriod参数实现一开始就是平滑固定速率。到此应用级限流的一些方法就介绍完了。假设将应用部署到多台机器，应用级限流方式只是单应用内的请求限流，不能进行全局限流。因此我们需要分布式限流和接入层限流来解决这个问题。分布式限流分布式限流最关键的是要将限流服务做成原子化，而解决方案可以使使用redis+lua或者nginx+lua技术进行实现，通过这两种技术可以实现的高并发和高性能。首先我们来使用redis+lua实现时间窗内某个接口的请求数限流，实现了该功能后可以改造为限流总并发/请求数和限制总资源数。Lua本身就是一种编程语言，也可以使用它实现复杂的令牌桶或漏桶算法。 redis+lua实现中的lua脚本： local key = KEYS[1] --限流KEY（一秒一个） local limit = tonumber(ARGV[1]) --限流大小 local current = tonumber(redis.call("INCRBY", key, "1")) --请求数+1 if current >

Limit then-if the current limit is exceeded

Return 0

Elseif current = = 1 then-only the first access requires setting an expiration time of 2 seconds

Redis.call ("expire", key, "2")

End

Return 1

The above operation is thread-safe because it is in a lua script and because Redis is a single-threaded model. One disadvantage of the above method is that it will increase gradually when the current limit is reached, which can be implemented in the following ways:

Local key = KEYS [1]-current limiting KEY (one per second)

Local limit = tonumber (ARGV [1])-current limit size

Local current = tonumber (redis.call ('get', key) or "0")

If current + 1 > limit then-if the current limit is exceeded

Return 0

Else-- number of requests + 1, and set expiration in 2 seconds

Redis.call ("INCRBY", key, "1")

Redis.call ("expire", key, "2")

Return 1

End

The following is the code in Java to determine whether a current limit is required:

Public static boolean acquire () throws Exception {

String luaScript = Files.toString (new File ("limit.lua"), Charset.defaultCharset ())

Jedis jedis = new Jedis ("192.168.147.52", 6379)

String key = "ip:" + System.currentTimeMillis () / 1000; / / the number of seconds of the current timestamp here

Stringlimit = "3"; / / current limit size

Return (Long) jedis.eval (luaScript,Lists.newArrayList (key), Lists.newArrayList (limit)) = = 1

}

Because of the limitation of Redis (write operations in Lua cannot use read operations with random nature, such as TIME), you cannot use TIME to obtain timestamps in Redis Lua, so you have to obtain timestamps from applications and then input them. In some extreme cases (where the machine clock is not accurate), there will be some minor problems with current restrictions.

Lua scripts implemented using Nginx+Lua:

Local locks = require "resty.lock"

Local function acquire ()

Local lock = locks:new ("locks")

Local elapsed, err = lock:lock ("limit_key")-mutex

Local limit_counter = ngx.shared.limit_counter-- counter

Local key = "ip:".. os.time ()

Local limit = 5-current limit size

Local current = limit_counter:get (key)

If current ~ = nil and current + 1 > limit then-if the current limit is exceeded

Lock:unlock ()

Return 0

End

If current = = nil then

Limit_counter:set (key, 1,1)-for the first time, you need to set the expiration time. Set the value of key to 1 and the expiration time to 1 second.

Else

Limit_counter:incr (key, 1)-add 1 for the second time

End

Lock:unlock ()

Return 1

End

Ngx.print (acquire ())

In the implementation, we need to use the lua-resty-lock mutex module to solve the atomicity problem (please consider the timeout problem of acquiring the lock when using it in the actual project), and use the ngx.shared.DICT shared dictionary to implement the counter. Return 0 if current limit is needed, otherwise return 1. When using it, you need to define two shared dictionaries (used to store lock and counter data respectively):

Java code

Http {

……

Lua_shared_dict locks 10m

Lua_shared_dict limit_counter 10m

}

Some people will wonder whether redis or nginx can withstand if the application concurrency is very large. However, this question should be considered from many aspects: whether your traffic is really that large, whether you can slice the distributed current limit through consistent hashing, and whether it can be downgraded to application-level current limit when the concurrency is too large; there are so many countermeasures that you can adjust according to the actual situation. For example, when JD.com uses Redis+Lua to limit rush purchase traffic, there is no problem with general traffic.

For distributed current limit, the current scenario is the service flow limit, not the traffic entrance flow limit; the traffic entrance flow limit should be completed in the access layer, and the author generally uses Nginx in the access layer.

Access layer current limit

The access layer usually refers to the entrance of request traffic. The main purposes of this layer are: load balancing, illegal request filtering, request aggregation, caching, degradation, current restriction, A _ OpenResty B testing, quality of service monitoring and so on. You can refer to the author's "using Nginx+Lua (OpenResty) to develop high-performance Web applications".

For Nginx access layer current limit can be used Nginx has two modules: connection number current limit module ngx_http_limit_conn_module and request current limit module ngx_http_limit_req_module implemented by leaky bucket algorithm. The Lua current limiting module lua-resty-limit-traffic provided by OpenResty can also be used for more complex current limiting scenarios.

Limit_conn is used to limit the total number of network connections corresponding to a KEY, which can be restricted according to dimensions such as IP and domain name. Limit_req is used to limit the average rate of requests corresponding to a KEY and can be used in two ways: smooth mode (delay) and allowed burst mode (nodelay).

Ngx_http_limit_conn_module

Limit_conn restricts the total number of network connections corresponding to a KEY. You can limit the total number of connections in the IP dimension by IP, or the total number of connections in a domain name by service domain name. But keep in mind that not every request connection is counted by the counter, only those request connections that are processed by Nginx and have read the entire request header are counted by the counter.

Example configuration:

Http {

Limit_conn_zone$binary_remote_addr zone=addr:10m

Limit_conn_log_level error

Limit_conn_status 503

...

Server {

...

Location / limit {

Limit_conn addr 1

}

Limit_conn: to configure the shared memory area where KEY and counters are stored and the maximum number of connections for the specified KEY. The maximum number of connections specified here is 1, which means that Nginx can process at most one connection concurrently.

Limit_conn_zone: used to configure the current-limiting KEY and the size of the shared memory area in which the corresponding information of the KEY is stored. The KEY here is "$binary_remote_addr", which represents the IP address, or you can use $server_name as the KEY to limit the maximum number of connections at the domain name level.

Limit_conn_status: the status code returned after the current limit is configured. 503 is returned by default.

Limit_conn_log_level: configure the log level after the flow is restricted. Default is error level.

The main execution process of limit_conn is as follows:

1. After entering the request, first determine whether the number of connections of the corresponding KEY in the current limit_conn_zone exceeds the configured maximum number of connections.

2.1.If the configured maximum size is exceeded, the current is limited and the error status code defined by limit_conn_status is returned.

2.2.Otherwise, add 1 to the number of connections to the corresponding KEY and register the callback function for the completed request processing

3. Carry out request processing

4. At the end of the request phase, the registered callback function is called to subtract 1 from the number of connections to the corresponding KEY.

Limt_conn can limit the total number of concurrency / requests of a KEY, and KEY can change as needed.

Configuration example according to IP limit the number of concurrent connections:

First, define the current-limiting area of the IP dimension:

Limit_conn_zone $binary_remote_addrzone=perip:10m

Then add the current-limiting logic to the location to be limited:

Location / limit {

Limit_conn perip 2

Echo "123"

}

That is, the maximum number of concurrent connections allowed per IP is 2.

Using the AB test tool to test, the number of concurrency is 5, and the total number of requests is 5:

Ab-n 5-c 5 http://localhost/limit

You will get the following access.log output:

[08/Jun/2016:20:10:51+0800] [1465373451.802] 200

[08/Jun/2016:20:10:51+0800] [1465373451.803] 200

[08/Jun/2016:20:10:51 + 0800] [1465373451.803] 503

Here we set the access log format to log_format main'[$time_local] [$msec] $status';, respectively, "date, second / millisecond value response status code".

If the current is restricted, you will see something similar to the following in error.log:

20:10:51 on 2016-06-08 [error] 5662 perip: * 5limiting connections by zone "perip", client: 127.0.0.1, server: _, request: "GET / limit HTTP/1.0", host: "localhost"

Configuration example by limiting the number of concurrent connections by domain name:

First, define the flow-limited area of the domain name dimension:

Limit_conn_zone $server_name zone=perserver:10m

Then add the current-limiting logic to the location to be limited:

Location / limit {

Limit_conn perserver 2

Echo "123"

}

That is, the maximum number of concurrent request connections per domain name is 2; this configuration can achieve the maximum number of connections to the server.

Ngx_http_limit_req_module

Limit_req is an implementation of leaky bucket algorithm, which is used to limit the flow of requests corresponding to a specified KEY, such as limiting the request rate according to the IP dimension.

Example configuration:

Http {

Limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s

Limit_conn_log_level error

Limit_conn_status 503

...

Server {

...

Location / limit {

Limit_req zone=one burst=5 nodelay

}

Limit_req: configure current limiting area, bucket capacity (burst capacity, default 0), whether delay mode (default delay)

Limit_req_zone: configure the current-limiting KEY, the size of the shared memory area in which the corresponding information of the KEY is stored, and the fixed request rate; the KEY specified here is "$binary_remote_addr" to indicate the IP address; the fixed request rate is configured with the rate parameter, which supports 10r/s and 60r/m, that is, 10 requests per second and 60 requests per minute, but eventually translates to a fixed request rate per second (10r/s processes one request every 100ms) 60r/m, which processes a request every 1000 milliseconds).

Limit_conn_status: the status code returned after the current limit is configured. 503 is returned by default.

Limit_conn_log_level: configure the log level after the flow is restricted. Default is error level.

The main execution process of limit_req is as follows:

1. After entering the request, first determine whether the last request time is relative to the current time (the first time is 0). If you need to limit the current time, perform step 2, otherwise perform step 3.

2.1. if the bucket capacity (burst) is not configured, the bucket capacity is 0; the request is processed at a fixed rate; if the request is limited, the corresponding error code is returned directly (default 503)

2.2.2.If bucket capacity (burst > 0) is configured and latency mode (nodelay is not configured); if the bucket is full, new requests are limited; if not, requests are processed at a fixed average rate (requests are processed at a fixed rate and delayed as needed, delaying the use of sleep implementation)

2.3.If bucket capacity (burst > 0) is configured and non-latency mode is configured (nodelay is configured), requests will not be processed at a fixed rate, but burst processing will be allowed; if the bucket is full, the request will be limited and the corresponding error code will be returned directly

3. If the current is not limited, the request will be processed normally.

4. Nginx will select some (3 nodes) current-limited KEY at the appropriate time for expiration processing and memory recovery.

Scenario 2.1 Test

First, define the current-limiting area of the IP dimension:

Limit_req_zone $binary_remote_addrzone=test:10m rate=500r/s

The limit is 500 requests per second, with a fixed average rate of 2 milliseconds per request.

Then add the current-limiting logic to the location to be limited:

Location / limit {

Limit_req zone=test

Echo "123"

}

That is, the bucket capacity is 0 (burst defaults to 0) and latency mode.

Test using the AB test tool, the number of concurrency is 2, and the total number of requests is 10:

Ab-n 10-c 2 http://localhost/limit

You will get the following access.log output:

[08/Jun/2016:20:25:56+0800] [1465381556.410] 200

[08/Jun/2016:20:25:56 + 0800] [1465381556.410] 503

[08/Jun/2016:20:25:56 + 0800] [1465381556.411] 503

[08/Jun/2016:20:25:56+0800] [1465381556.411] 200

[08/Jun/2016:20:25:56 + 0800] [1465381556.412] 503

Although 500requests per second are allowed, because the bucket capacity is 0, the incoming requests are either processed or limited, and cannot be delayed; in addition, the average rate is about 2 milliseconds, such as 1465381556.410 and 1465381556.411; some friends will say that the fixed average rate is not 1 millisecond, in fact, this is because the implementation algorithm is not so accurate.

If the current is restricted in error.log, you will see the following:

20:25:56 on 2016-06-08 [error] 6130pm: * 1962limiting requests, excess: 1.000 by zone "test", client: 127.0.0.1 request: "GET / limit HTTP/1.0", host: "localhost"

If delayed, you will see the following in error.log (log level is INFO level):

09:05:23 on 2016-06-10 [warn] 9766: 0: * 97021delaying request, excess: 0.368, by zone "test", client: 127.0.0.1 request server: _, request: "GET / limit HTTP/1.0", host: "localhost"

Scenario 2.2 Test

First, define the current-limiting area of the IP dimension:

Limit_req_zone $binary_remote_addr zone=test:10m rate=2r/s

To facilitate testing, the rate is set to 2 requests per second, that is, the fixed average rate is 500 milliseconds per request.

Then add the current-limiting logic to the location to be limited:

Location / limit {

Limit_req zone=test burst=3

Echo "123"

}

The fixed average rate is 500 milliseconds per request, and the capacity is 3. If the bucket is full of new requests, the flow is limited, otherwise you can queue up in the bucket and wait (implement delay mode).

To see the effect of current restriction, we wrote a req.sh script:

Ab-c 6-n 6 http://localhost/limit

Sleep 0.3

Ab-c 6-n 6 http://localhost/limit

First, perform 6 concurrent requests for 6 URL, then hibernate for 300ms, and then perform 6 concurrent requests for 6 URL; intermediate hibernation in order to see the effect beyond 2 seconds. If you do not see the following effect, you can adjust the sleep time.

You will get the following access.log output:

[09/Jun/2016:08:46:43+0800] [1465433203.959] 200

[09/Jun/2016:08:46:43 + 0800] [1465433203.959] 503

[09/Jun/2016:08:46:43 + 0800] [1465433203.960] 503

[09/Jun/2016:08:46:44+0800] [1465433204.450] 200

[09/Jun/2016:08:46:44+0800] [1465433204.950] 200

[09/Jun/2016:08:46:45 + 0800] [1465433205.453] 200

[09/Jun/2016:08:46:45 + 0800] [1465433205.766] 503

[09/Jun/2016:08:46:45 + 0800] [1465433205.767] 503

[09/Jun/2016:08:46:45+0800] [1465433205.950] 200

[09/Jun/2016:08:46:46+0800] [1465433206.451] 200

[09/Jun/2016:08:46:46+0800] [1465433206.952] 200

The bucket capacity is 3, that is, a maximum of 3 requests flow into the bucket in the time window, and the requests are processed at a fixed rate of 2r/s (that is, one request is processed every 500ms); the bucket computing time window (1.5s) = rate (2r/s) / bucket capacity (3), that is, a maximum of 3 requests are temporarily stored in the bucket within this time window. Therefore, we need to push forward the current time by 1.5 seconds and 1 second to calculate the total number of requests in the time window; in addition, because the default is delay mode, requests in the time window are temporarily stored in a bucket and processed at a fixed average rate:

The first round: four requests were processed successfully, and a maximum of 3 requests should be processed according to the capacity of the leaky bucket. This is because there is no reference value in the first calculation, so the subsequent calculation can have a reference value only after the first calculation. Therefore, the first success can be ignored; the impact of this problem is small and negligible; and requests are processed at a fixed rate of 500 milliseconds.

The second round: because the first round of requests is sudden, almost all at 1465433203.959 points in time, just because the leaky bucket smoothes the rate into a fixed average rate (one request per 500ms); the second round of calculation time should be based on 1465433203.959 The second round of sudden requests is almost all at 1465433205.766 time point, so the time window for calculating bucket capacity should be based on 1465433203.959 and 1465433205.766. The result is 1465433205.766 when the leaky bucket is empty, and three requests can flow into the bucket, and other requests are rejected. Because the last processing time of the first round is 1465433205.453, the first request of the second round is delayed to 1465433205.950. It should also be noted that the fixed average rate is only about the configured rate, there is a problem of calculation accuracy, there will be some deviation.

If the bucket capacity is changed to 1 (burst=1), execute the req.sh script to see the following output:

09/Jun/2016:09:04:30+0800] [1465434270.362] 200

[09/Jun/2016:09:04:30 + 0800] [1465434270.371] 503

[09/Jun/2016:09:04:30 + 0800] [1465434270.372] 503

[09/Jun/2016:09:04:30+0800] [1465434270.864] 200

[09/Jun/2016:09:04:31 + 0800] [1465434271.178] 503

[09/Jun/2016:09:04:31 + 0800] [1465434271.179] 503

[09/Jun/2016:09:04:31+0800] [1465434271.366] 200

The bucket capacity is 1 and requests are processed at a fixed average rate of one request every 1000 milliseconds.

Scenario 2.3 Test

First, define the current-limiting area of the IP dimension:

Limit_req_zone $binary_remote_addr zone=test:10m rate=2r/s

To facilitate the test configuration of 2 requests per second, the fixed average rate is 500 milliseconds per request.

Then add the current-limiting logic to the location to be limited:

Location / limit {

Limit_req zone=test burst=3 nodelay

Echo "123"

}

The bucket capacity is 3, and if the bucket is full and rejects new requests directly, and up to two requests per second, the bucket processes requests in nodelay mode at a fixed rate of 500ms.

To see the effect of current restriction, we wrote a req.sh script:

Ab-c 6-n 6 http://localhost/limit

Sleep 1

Ab-c 6-n 6 http://localhost/limit

Sleep 0.3

Ab-c 6-n 6 http://localhost/limit

Sleep 0.3

Ab-c 6-n 6 http://localhost/limit

Sleep 0.3

Ab-c 6-n 6 http://localhost/limit

Sleep 2

Ab-c 6-n 6 http://localhost/limit

You will get access.log output similar to the following:

[09/Jun/2016:14:30:11+0800] [1465453811.754] 200

[09/Jun/2016:14:30:11+0800] [1465453811.755] 200

[09/Jun/2016:14:30:11+0800] [1465453811.759] 200

[09/Jun/2016:14:30:11 + 0800] [1465453811.759] 503

[09/Jun/2016:14:30:12+0800] [1465453812.776] 200

[09/Jun/2016:14:30:12 + 0800] [1465453812.776] 503

[09/Jun/2016:14:30:12 + 0800] [1465453812.777] 503

[09/Jun/2016:14:30:13 + 0800] [1465453813.095] 503

[09/Jun/2016:14:30:13 + 0800] [1465453813.097] 503

[09/Jun/2016:14:30:13 + 0800] [1465453813.098] 503

[09/Jun/2016:14:30:13+0800] [1465453813.425] 200

[09/Jun/2016:14:30:13 + 0800] [1465453813.425] 503

[09/Jun/2016:14:30:13 + 0800] [1465453813.426] 503

[09/Jun/2016:14:30:13+0800] [1465453813.754] 200

[09/Jun/2016:14:30:13 + 0800] [1465453813.755] 503

[09/Jun/2016:14:30:13 + 0800] [1465453813.756] 503

[09/Jun/2016:14:30:15+0800] [1465453815.278] 200

[09/Jun/2016:14:30:15 + 0800] [1465453815.278] 503

[09/Jun/2016:14:30:15 + 0800] [1465453815.279] 503

[09/Jun/2016:14:30:17+0800] [1465453817.300] 200

[09/Jun/2016:14:30:17+0800] [1465453817.301] 200

[09/Jun/2016:14:30:17 + 0800] [1465453817.301] 503

The bucket capacity is 3 (that is, a maximum of 3 requests flow into the bucket within the time window, and the requests are processed at a fixed rate of 2r/s (that is, one request is processed every 500ms); the bucket computing time window (1.5s) = rate (2r/s) / bucket capacity (3), that is, a maximum of 3 requests are temporarily stored in the bucket within this time window. Therefore, we need to forward the current time by 1.5 seconds and 1 second to calculate the total number of requests in the time window; in addition, because nodelay is configured in non-delay mode, sudden requests in the time window are allowed; in addition, two problems can be seen from this example:

Round 1 and 7: 4 requests were processed successfully; this is due to a problem with the calculation algorithm. In this example, if there are no requests within 2 seconds, and then many requests come suddenly, the result of the first calculation will be incorrect; the impact of this problem is small and can be ignored.

Round 5: there are 3 requests in 1.0 seconds. This is also due to the calculation accuracy, that is, the algorithm implemented by limit_req is not very accurate. If it is considered as relative to 2.75, there is only one request in 1.0 seconds, so one request is still allowed.

If there is an error in the flow limit, you can configure the error page:

Proxy_intercept_errors on

Recursive_error_pages on

Error_page 503 / / www.jd.com/error.aspx

If the memory defined by limit_conn_zone/limit_req_zone is insufficient, subsequent requests will be limited all the time, so you need to set the corresponding memory size according to the requirements.

The current limit here is a single Nginx. If we have multiple nginx in the access layer, the problem here is the same as that at the application level. How to deal with it? One solution: establishing a load balancer layer will hash the request to the access layer Nginx according to the current limit KEY consistent hashing algorithm, so that the same KEY will be sent to the same access layer Nginx; another solution is to use Nginx+Lua (OpenResty) to call the distributed current limit logic implementation.

Lua-resty-limit-traffic

The two modules introduced earlier are relatively simple to use, just specify KEY, specify current limiting rate, etc. If we want to change the dynamic characteristics such as KEY, change rate, bucket size and so on according to the actual situation, it is difficult to use the standard module, so we need a kind of programmability to solve our problem. OpenResty provides lua current limiting module lua-resty-limit-traffic, through which dynamic current limiting processing can be carried out according to more complex business logic. It provides limit.conn and limit.req implementations, and the algorithm is the same as nginx limit_conn and limit_req.

Here we implement [scenario 2.2 testing] in ngx_http_limit_req_module, don't forget to download the lua-resty-limit-traffic module and add it to OpenResty's lualib.

Configure a shared dictionary to store current restrictions:

Lua_shared_dict limit_req_store 100m

The following is the current-limiting code limit_req.lua that implements [scenario 2.2 testing]:

Local limit_req = require "resty.limit.req"

Local rate = 2-fixed average rate 2r/s

Local burst = 3-bucket capacity

Local error_status = 503

Local nodelay = false-whether processing is required without delay

Local lim, err = limit_req.new ("limit_req_store", rate, burst)

If not lim then-No shared dictionary defined

Ngx.exit (error_status)

End

Local key = ngx.var.binary_remote_addr-- current limit of IP dimension

-- inflow request. If the request needs to be delayed, delay > 0

Local delay, err = lim:incoming (key, true)

If not delay and err = "rejected" then-- out of bucket size

Ngx.exit (error_status)

End

If delay > 0 then-decide whether to delay processing or not as needed

If nodelay then

-- dealing with it in an emergency directly.

Else

Ngx.sleep (delay)-delay processing

End

That is, the current-limiting logic is accessed in the nginx access phase, and if it is not limited, it will continue the subsequent process; if it needs to be restricted, either sleep will continue the subsequent process for a period of time, or return the corresponding status code to reject the request.

In the distributed current limiting, we use a simple Nginx+Lua for distributed current limiting, and with this module, we can also use this module to achieve distributed current limiting.

In addition, when using Nginx+Lua, you can also obtain ngx.var.connections_active for overload protection, that is, current limit protection if the number of current active connections exceeds the threshold.

If tonumber (ngx.var.connections_active) > = tonumber (limit) then

/ / current restriction

End

Nginx also provides limit_rate to limit the speed of traffic, such as limit_rate 50k, which means the download speed is limited to 50k.

At this point, the author has introduced the current-limiting usage involved in my work. Some of these algorithms allow bursts, some are smooth, and some are simple and rough. Token bucket algorithm and leaky bucket algorithm are similar in implementation, but they are expressed in a different direction, so there is no need to deliberately distinguish them for business. Therefore, it is necessary to decide how to limit current according to the actual scenario, and the best algorithm may not be the most applicable.

references

Https://en.wikipedia.org/wiki/Token_bucket

Https://en.wikipedia.org/wiki/Leaky_bucket

Http://redis.io/commands/incr

Http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

Http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html

Https://github.com/openresty/lua-resty-limit-traffic

Http://nginx.org/en/docs/http/ngx_http_core_module.html#limit_rate

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.