How to deal with sudden Traffic in EKS 04/19 Update SLTechnology News&Howtos

How to deal with sudden Traffic in EKS

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "EKS how to deal with sudden traffic". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

preface

Hybrid cloud is a deployment form. On the one hand, enterprises can choose hybrid cloud from the perspectives of asset utilization, cost control, risk reduction and locking. On the other hand, enterprises can also obtain the comparative advantages of different cloud service providers through hybrid business deployment, and make the differences in capabilities of different cloud service providers complement each other. Container and hybrid cloud are a natural combination. Based on container standardization encapsulation, the coupling between application operation environment and hybrid cloud heterogeneous infrastructure is greatly reduced. Enterprises are more likely to realize multi-cloud/hybrid cloud agile development and continuous delivery, making it possible to standardize application management in multiple regions. The TKE container team provides a range of product capabilities to meet hybrid cloud scenarios. This article introduces one of the product features for bursty traffic scenarios-third-party cluster bombs EKS.

Low-cost expansion

IDC's resources are limited. When there is a burst of traffic to deal with, the computing resources in IDC may not be enough to deal with it. It is a good choice to use public cloud resources to deal with temporary traffic. Common deployment architectures are: create a new cluster in the public cloud, deploy part of the workload to the cloud, and route traffic to different clusters through DNS rules or Load Balancer policies:

In this model, the deployment architecture of the business has changed, so it needs to be fully evaluated before use:

Which business workloads need to be deployed in the cloud, in whole or in part;

Whether the services deployed on the cloud have environment dependencies, such as IDC intranet DNS, DB, public services, etc.;

How to display business logs and monitoring data uniformly on and off the cloud;

On-cloud and off-cloud traffic scheduling rules;

How CD tools fit into multi-cluster business deployments;

Such transformation investment is worthwhile for service scenarios requiring long-term maintenance of multi-region access, but the cost is relatively high for bursty traffic service scenarios. Therefore, for this scenario, we have introduced the ability to easily use public cloud resources to cope with sudden business traffic in a single cluster: third-party cluster bomb EKS, which is Tencent Cloud (Cloud) Elastic Kubernetes Service, which can create and destroy a large number of POD resources in seconds. Users only need to put forward POD resource requirements without maintaining cluster node availability, which is very suitable for elastic scenarios. The ability to quickly scale to EKS is achieved by simply installing the relevant plug-in package in the cluster.

This approach scales faster than directly using VM nodes on the cloud, and we also provide two scheduling mechanisms to meet customer scheduling priority requirements:

Global switch: At cluster level, when cluster resources are insufficient, any workload that needs to create a new Pod can create a replica on Tencent Cloud (Cloud) EKS;

Local switch: At the workload level, users can specify that after N replicas of a single workload are kept in the cluster, other replicas are created in Tencent Cloud (Cloud) EKS;

In order to ensure that all workloads have enough copies in the local IDC, when the burst traffic passes and the capacity reduction is triggered, the EKS copy on Tencent Cloud is supported to be preferentially reduced (TKE distribution cluster is required. Please look forward to the subsequent release of this series of articles for detailed introduction of TKE distribution).

In this mode, the business deployment architecture does not change, and the cloud resources can be used flexibly in a single cluster, avoiding a series of derivative problems such as introducing business architecture transformation, CD pipeline transformation, multi-cluster management, and monitoring log system. In addition, the use of cloud resources is on-demand and on-demand billing, which greatly reduces user usage costs. However, in order to ensure the security and stability of workload, we require users 'IDC to interoperate with Tencent Cloud (Cloud) public cloud VPC dedicated line, and users also need to evaluate applicability from storage dependence, delay tolerance and other aspects.

EKS pod can communicate with local cluster pod and node in underlay network mode (route of local pod cidr needs to be added in Tencent Cloud (Cloud) VPC, refer to route configuration), third-party cluster bomb EKS has been open-sourced in TKEStack. For details and examples, see usage documentation.

Practical demonstration steps Get tke-resilience helm chartgit clone https://github.com/tkestack/charts.git Configure VPC information:

Edit charts/incubator/tke-resilience/values.yaml and fill in the following information:

cloud:appID: "{Tencent Cloud (Cloud) account APPID}" ownerUIN: "{Tencent Cloud user account ID}"secretID: "{Tencent Cloud account secretID}"secretKey: "{Tencent Cloud account secretKey}"vpcID: "{VPC ID placed by EKS POD}"regionShort: {region abbreviation placed by EKS POD}regionLong: {full name of region placed by EKS POD}subnets:- id: "{subnet ID placed by EKS POD}"zone: "{availability zone placed by EKS POD}"eklet: podUsedAisolator: {API Server address of current cluster} Install tke-resilience helm charthelm install tke-resilience --namespace kube-system ./ charts/incubator/tke-resilience/

Confirm chart pod is working properly

Create demo application nginx: ngx1 effect demo: global scheduling

Since this feature is turned on by default, we first set AUTO_SCALE_EKS in kube-system to false. By default, the number of copies of ngx1 is 1. Adjust the number of copies of ngx1 to 50.

You can see that a large number of pods are in pending state due to insufficient resources. After setting AUTO_SCALE_EKS in kube-system to true, after a short wait, observe the pod state. The pod originally in pend is dispatched to the EKS virtual node: eklet-subnet-167kzflm.

specified scheduled

Let's adjust the number of copies of ngx1 to 1 again. Edit ngx1 yaml and set the local switch on.

spec: template: metadata: annotations: #Open local switch AUTO_SCALE_EKS: "true" #Set the number of replicas to be created in the local cluster LOCAL_REPLICAS: "2"" spec: #Use tke scheduler schedulerName: tke-scheduler

Change the number of copies of ngx1 to 3, and although there is no shortage of resources in the local cluster, you can see that after more than 2 local copies, the third copy is scheduled to EKS

Uninstall tke-resilience plugin

helm uninstall tke-resilience -n=kube-system

In addition, TKEStack has integrated tke-resilience, which users can install in the application market of TKEStack.

Cloud outbreak in application scenarios

E-commerce promotion, live broadcast and other scenarios that need to expand a large number of temporary workloads in a short time. In this scenario, the resource demand time is very short. In order to cope with this short-term demand, a large amount of resources are reserved daily, which is bound to cause a large waste of resources, and it is difficult to accurately evaluate the resource demand with each activity change. With this feature, you don't need to focus on resource preparation. You only need to rely on K8S's automatic scaling function to quickly create a large number of workloads for your business. After the traffic peak has passed, POD on the cloud will be destroyed first to ensure no resource waste.

off-line calculation

In big data and AI business scenarios, computing tasks also have high elastic requirements for computing power. In order to ensure the rapid completion of the task calculation, it is necessary to have a large amount of computing power support in a short time. After the calculation is completed, the computing power is also in a low load state, and the utilization rate of computing resources is highly fluctuating, resulting in a waste of resources. And due to the scarcity of GPU resources, users hoarding a large number of GPU devices is not only very costly, but also faces a variety of resource management problems such as resource utilization improvement, new card adaptation, old Cali, heterogeneous computing, etc., while the rich GPU card types on the cloud can provide users with more choices.

"EKS how to deal with sudden traffic" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.