Best practices on how to build TKE clusters 10/17 Update SLTechnology News&Howtos

Best practices on how to build TKE clusters

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to build TKE cluster best practices, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Kubernetes version

The iteration of the Kubernetes version is relatively fast. The new version usually contains many bug fixes and new features, and the old version is gradually phased out. It is recommended to select the latest version supported by the current TKE when creating a cluster, and you can also support Master and node version upgrades after the subsequent new version.

Network mode: GlobalRouter vs VPC-CNI

GlobalRouter schema architecture:

Based on the container network capability realized by CNI and bridge, container routing is directly realized through VPC underlying layer.

The container and the node are on the same network plane, but the network segment does not overlap with the VPC network segment, and the container network segment address is abundant.

VPC-CNI schema architecture:

Container network capability based on CNI and VPC ENI. The container is routed through the ENI. The performance is about 10% higher than that of Global Router.

The container and the node are on the same network plane, and the network segment is in the VPC network segment.

Pod fixed IP is supported.

Comparison of network models:

Three modes of use are supported:

Specify GlobalRouter mode when creating a cluster

Specify VPC-CNI mode when creating a cluster, and all subsequent Pod must be created using VPC-CNI mode

Specify GlobalRouter mode when creating a cluster, and enable VPC-CNI support for the cluster when you need to use VPC-CNI mode, that is, a mixture of the two modes.

Suggestions for type selection:

In most cases, GlobalRouter should be selected. The container network segment has plenty of addresses and strong expansibility, which can adapt to large-scale services.

If some businesses need to use VPC-CNI mode later, you can enable VPC-CNI support in the GlobalRouter cluster, that is, mix GlobalRouter with VPC-CNI, and use VPC-CNI mode only for some businesses.

If you fully understand and accept the limitations of VPC-CNI, and you need all Pod in the cluster to use VPC-CNI mode, you can choose the VPC-CNI network plug-in when creating the cluster.

Refer to the official document "how to choose a Container Service Network Model": https://cloud.tencent.com/document/product/457/41636

Runtime: Docker vs Containerd

Docker as the runtime architecture:

Kubelet's built-in dockershim module adapts the CRI interface to Aojiao's docker, and then kubelet adjusts its own dockershim (through socket files), then dockershim adjusts the dockerd interface (Docker HTTP API), and then dockerd adjusts docker-containerd (gRPC) to create and destroy containers.

Why is the call chain so long? At first, Kubernetes only supported Docker, but later introduced CRI to abstract the runtime to support a variety of runtimes, while Docker competed with Kubernetes in some aspects and did not want to be a younger brother, so it did not implement the CRI interface at the dockerd level, so kubelet implemented CRI for dockerd in order to make dockerd support CRI. The internal components of docker itself are also modularized, plus a layer of CRI adaptation, and the call chain will certainly be long.

Containerd as the runtime architecture:

After containerd 1.1, CRI Plugin is supported, that is, the CRI interface can be adapted to containerd itself.

Compared with the Docker scheme, the call chain has less dockershim and dockerd.

Run-time comparison:

Due to bypassing dockerd, containerd scheme has shorter call chain, fewer components, less node resources, and bypasses some bug of dockerd itself, but containerd itself still has some bug (some have been fixed, grayscale).

Docker scheme has a long history, relatively more mature, supporting docker api, rich features, in line with the habits of most people.

Suggestions for type selection:

Docker scheme is more mature than containerd. If the stability requirement is high, docker scheme is recommended.

Only docker can be used in the following scenarios:

Docker in docker (usually in CI scenarios)

Use the docker command on the node

Call docker API

Containerd is recommended for none of the above scenarios.

Service forwarding mode: iptables vs ipvs

Let's first take a look at the forwarding principle of Service:

The kube-proxy component watch apiserver on the node, gets Service and Endpoint, converts them into iptables or ipvs rules according to the forwarding mode, and writes them to the node

If the client in the cluster accesses Service (Cluster IP), it will be load balanced to the backend pod corresponding to Service by iptable/ipvs rules.

Comparison of forwarding modes:

Ipvs mode has higher performance, but there are also some known unresolved bug

The iptables model is more mature and stable.

Suggestions for type selection:

If the requirement for stability is very high and the number of service is less than 2000, choose iptables.

Ipvs is preferred for other scenarios.

Cluster type: managed cluster vs independent cluster

Managed clusters:

Master components are not visible to users and are hosted by Tencent Cloud

Many new features will also be the first to support managed clusters.

The computing resources of Master will automatically expand according to the size of the cluster.

Users do not need to pay for Master

Stand-alone clusters:

Users of Master components can have complete control

Users need to pay for the machine for Master.

Suggestions for type selection:

It is generally recommended that managed clusters

If you want to have complete control of Master, you can use separate clusters (such as customizing Master to achieve advanced functions)

Node operating system

TKE mainly supports Ubuntu and CentOS distributions, with the "TKE-Optimized" suffix using the TKE customized and optimized kernel, and the rest is the official open source kernel of the linux community:

Advantages of TKE-Optimized:

Customization based on version 4.14.105 supported by the kernel community for a long time

Optimize for container and cloud scenarios

Compute, storage, and network subsystems are all performance optimized

Good support for kernel defect repair

Completely open source: https://github.com/Tencent/TencentOS-kernel

Suggestions for type selection:

Recommend "TKE-Optimized" with good stability and technical support.

If you need a later version of the kernel, choose a non-"TKE-Optimized" version of the operating system

Node pool

This feature is currently in grayscale and can be applied for whitelist use. It can be used to manage nodes in bulk:

Node Label and Taint

Node component startup parameters

Node custom startup script

Operating system and runtime (not supported yet)

Product documentation: https://cloud.tencent.com/document/product/457/43719

Applicable scenarios:

Group management of heterogeneous nodes to reduce management costs

Make the cluster better support complex scheduling rules (Label, Taint)

Frequent expansion and reduction of capacity nodes to reduce operating costs

Node routine maintenance (version upgrade)

Examples of usage:

Some IO-intensive businesses need high IO models. Create a node pool for them, configure models and set node Label and Taint uniformly, then configure IO-intensive services with affinity and select Label to schedule them to nodes with high IO models (Taint can avoid other business Pod scheduling).

With the passage of time, the volume of business increases rapidly, and the IO-intensive business also needs more computing resources. During the business peak, the HPA function automatically expands the Pod for the service, while the node computing resources are insufficient. At this time, the automatic scaling feature of the node pool automatically expands the node capacity and supports the traffic peak.

Startup script

Component custom parameters

This feature is currently in grayscale and can be applied for whitelist use.

When creating a cluster, you can customize the startup parameters of the Master component in the Advanced Settings of the cluster information interface:

When you add a node, you can customize the launch parameters of kubelet in the "Advanced Settings" of the CVM configuration interface:

Node startup configuration

When you create a new cluster, you can add a node startup script at the "Node launch configuration" option in the CVM configuration interface:

When adding a node, you can start the script by customizing the data configuration node in the "Advanced Settings" of the CVM configuration interface (which can be used to modify component startup parameters, kernel parameters, etc.):

The best practices on how to build a TKE cluster are shared here. I hope the above content can be helpful to you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.