How to realize distributed current limit by kubernetes 07/12 Update SLTechnology News&Howtos

How to realize distributed current limit by kubernetes

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to achieve distributed current limit in kubernetes". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "kubernetes how to achieve distributed current limit" bar!

I. concept

Ratelimiting refers to the restriction of requests for application services. For example, requests for an interface are limited to 100s per second, and requests exceeding the limit are quickly failed or discarded.

1.1 usage scenarios

Current restriction can deal with:

Sudden requests brought by hot services

Sudden request caused by caller bug

Malicious attack request.

1.2 Dimensions

For current restriction scenarios, you generally need to consider the information of two dimensions:

Time limit is based on a certain time range or a certain time point, that is, what we often call "time window", such as limiting the time window per minute and clock.

Resources are based on the limitations of available resources, such as setting the maximum number of visits, or the maximum number of available connections.

Flow restriction is to restrict access to resources in a certain time window, such as setting a maximum of 100 access requests per second.

1.3 distributed current limit

Compared with stand-alone current limit, distributed current limit only allocates the frequency of current limit to each node, such as restricting a service to access 100qps. If there are 10 nodes, then each node can be accessed an average of 10 times. If the limit is exceeded, the frequency limit will be carried out.

Second, common schemes of distributed current limit

Guava-based client current limiting Guava is a client component, which provides several current limiting support classes led by RateLimiter under its multithreading module. It can only limit the flow of "current" services, that is, it does not belong to a distributed current-limiting solution.

The gateway layer current-limiting service gateway, as the first barrier in the whole distributed link, undertakes all user visit requests. If we limit the current at the gateway layer, we can achieve the goal of overall current limit. At present, the mainstream gateway layer includes Nginx represented by software, gateway layer components such as Gateway and Zuul in Spring Cloud, and F5 represented by hardware.

Middleware current limiting stores current limiting information in a middleware (such as Redis cache) in a distributed environment, where each component can obtain traffic statistics at the current time and decide whether to deny service or release traffic.

Current limiting components there are also some open source components that provide current limiting functions, such as Sentinel is a good choice. Sentinel is an open source component from Ali and is included in the Spring Cloud Alibaba component library. Hystrix also has the function of current limit.

Guava's Ratelimiter design and implementation is quite good, but it can only support a single machine. If the gateway layer is a single machine, it does not meet the high availability, and the distributed gateway still needs to rely on middleware to limit the flow, while network communications such as redis need to take up a small part of the network consumption. The same is true for Ali's Sentinel. The underlying layer uses redis or zookeeper, and you need to call the redis or zk API for each access. So in the cloud native scenario, do we have a better way?

For services in pursuit of high performance without considering circuit breakers and downgrades, it is necessary to minimize the IO between networks, so whether the average current limit can be achieved in a single machine through a total frequency limit and then assigned to a specific single machine, for example, if the qps of an ip is limited to 100, and the service has a total of 10 nodes, then the average 10qps in each service can be realized through the ratelimiter of guava. Even if the node of the service is adjusted dynamically, the qps of a single service can be adjusted dynamically.

Third, distributed current limit based on kubernetes

In Spring Boot application, define a filter, get the key (ip, userId, etc.) in the request parameters, and then obtain the rateLimiter according to key, where the creation of rateLimiter is judged by the number of frequency limits and copies defined by the database, and finally, whether it can be passed or not is judged by rateLimiter.tryAcquire.

3.1 number of replicas in kubernetes

In the actual service, the data reporting service is generally unable to determine the reporting time and volume of the client. Especially for this kind of high performance, the service generally uses HPA to achieve dynamic expansion and reduction. Therefore, it is necessary to obtain the number of service replicas at intervals.

Func CountDeploymentSize (namespace string, deploymentName string) * int32 {deployment, err: = client.AppsV1 () .Deployments (namespace) .get (context.TODO (), deploymentName, metav1.GetOptions {}) if err! = nil {return nil} return deployment.Spec.Replicas}

Usage: GET host/namespaces/test/deployments/k8s-rest-api can be used directly.

3.2 creation of rateLimiter

Define a LoadingCache in RateLimiterService, where key can be ip, userId, etc., and, in the case of multithreading, use refreshAfterWrite to block only threads loading data, while other threads return old data, making the best use of caching.

Private final LoadingCache loadingCache = Caffeine.newBuilder () .maximumSize (10000) .refreshAfterWrite (20, TimeUnit.MINUTES) .build (this::createRateLimit); / / define a default minimum QPSprivate static final Integer minQpsLimit = 3000

Then create rateLimiter, get the total frequency limit totalLimit and copy number replicas, and then make your own logical judgment. You can limit the qps according to the situation of totalLimit and replicas.

Public RateLimiter createRateLimit (String key) {log.info ("createRateLimit,key: {}", key); int totalLimit = get the total frequency limit, you can define Integer replicas = kubernetesService.getDeploymentReplicas () in the database; RateLimiter rateLimiter; if (totalLimit > 0 & & replicas = = null) {rateLimiter = RateLimiter.create (totalLimit);} else if (totalLimit > 0) {int nodeQpsLimit = totalLimit / replicas; rateLimiter = RateLimiter.create (nodeQpsLimit > minQpsLimit? NodeQpsLimit: minQpsLimit);} else {rateLimiter = RateLimiter.create (minQpsLimit);} log.info ("create rateLimiter success,key: {}, rateLimiter: {}", key, rateLimiter); return rateLimiter;} 3.3 rateLimiter acquisition

Obtain RateLimiter according to key. If there is a special need, you need to judge if key does not exist.

Judgment in public RateLimiter getRateLimiter (String key) {return loadingCache.get (key);} 3.4 filter

The last step is to use rateLimiter to limit the current. If rateLimiter.tryAcquire () is true, filterChain.doFilter (request, response) is performed. If false, HttpStatus.TOO_MANY_REQUESTS is returned.

Public class RateLimiterFilter implements Filter {@ Resource private RateLimiterService rateLimiterService; @ Override public void doFilter (ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException {HttpServletRequest httpServletRequest = (HttpServletRequest) request; HttpServletResponse httpServletResponse = (HttpServletResponse) response; String key = httpServletRequest.getHeader ("key"); RateLimiter rateLimiter = rateLimiterService.getRateLimiter (key) If (rateLimiter! = null) {if (rateLimiter.tryAcquire ()) {filterChain.doFilter (request, response);} else {httpServletResponse.setStatus (HttpStatus.TOO_MANY_REQUESTS.value ());}} else {filterChain.doFilter (request, response);} IV, performance stress test

In order to facilitate the comparison of the performance gap, we have done the following tests on the local stand-alone, in which the total frequency limit is set to 30,000.

Infinite flow

Use redis to limit current

Among them, the ping redis is about 6-7ms. Correspondingly, each request requires access to redis, and the delay is about 6-7ms. The performance degradation is obvious.

Self-developed current limit

The performance is almost as good as the unlimited flow scene, and guava's rateLimiter is indeed excellent.

5. Other problems 5.1 how to solve the problem when ensuring the accuracy of qps frequency limit?

In K8s, the service is dynamically expanded and scaled down, and accordingly, each node should change. If the frequency-limited 100qps is declared to be frequency-limited, and the subsequent business really requires 100% accuracy, it can only reduce the expiration time of the LoadingCache, so that it can update the qps of a single node in near real time. We also need to consider the pressure of K8s here, because we have to get the number of copies every time, and we also need to do caching here.

5.2 the service is dynamically expanded from 1 node to 4 nodes, and the new node is identified as 4, but in fact, some of them have not been started, so will a node be under too much pressure?

In theory, this is possible. At this time, we need to consider the initial number of copies. Expansion and reduction cannot be achieved overnight, from 1 to 4 to dozens of such. Generally speaking, the production environment must not have only one node, and if the expansion and reduction of capacity should be considered, there should be multiple copies to prepare.

5.3 if there are multiple copies, how to ensure that the request is uniform

This depends on the service load balancing strategy of K8s, which we have done before, and the traffic can indeed fall evenly on the node. In addition, our entire flow restriction is based on K8s. If there is a problem with K8s, it is possible that all services in the entire cluster may have problems.

At this point, I believe you have a deeper understanding of "how to achieve distributed current limit in kubernetes". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.