What are the highly available strategies for Kubernetes Master 07/04 Update SLTechnology News&Howtos

What are the highly available strategies for Kubernetes Master

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you what are the highly available strategies for Kubernetes Master. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

The high availability of Kubernetes may be a common problem before completing a preliminary technical assessment and planning to migrate the production environment to the Kubernetes cluster. In order to reduce the business interruption caused by the server crash, the business systems in the production environment are often highly available, but when the new cluster management system of Kubernetes is introduced, the server is no longer a single individual. Once the service of the Kubernetes Master in the central location is interrupted, it will lead to all Node nodes out of control and may cause serious accidents.

Generally speaking, this is a topic that has been discussed many times, but there is no unified solution for the time being. Today, we mainly introduce some high-availability strategies of Kubernetes Master for your reference.

A small goal.

High availability is a complex system engineering. For space considerations and capacity constraints, today we first focus on a small goal: all Kubernetes Master servers do not have a single point of failure, any server crash does not affect the normal operation of Kubernetes.

The direct benefit of achieving this goal is that we can achieve rolling upgrades of all servers without affecting the normal operation of the business, which is helpful to complete the upgrade of system components and the distribution of security patches.

In order to achieve the goal of no single point of failure, you need to establish a high availability scheme for the following components:

Etcd

Kube-apiserver

Kube-controller-manager and kube-scheduler

Kube-dns

The relationships of these components can be found in the following diagram of the cluster architecture.

Here is a detailed description of the high availability strategies for each component.

Etcd High availability

Etcd is the only stateful service in Kubernetes, and it is also a high availability difficulty. Kubernetes chooses etcd as its back-end data storage warehouse because of its distributed architecture and no single point of failure.

Although the etcd of a single node can work properly. However, the recommended deployment scheme is to use 3 or 5 nodes to form an etcd cluster for Kubernetes use.

The commonly used kubeadm tool defaults to launching etcd and all Master components on a single node. Although it is very convenient to use, you should pay attention to the risk of downtime of this node if you want to use the production environment.

There are basically three ways to achieve high availability of etcd:

One is to use a separate etcd cluster, using 3 or 5 servers to run only etcd, independent maintenance and upgrade. You can even use CoreOS's update-engine and locksmith to let the server complete the upgrade completely on its own. This etcd cluster will be used as a cornerstone to build the entire cluster. The main motivation for adopting this strategy is that the nodes of the etcd cluster need to be explicitly notified to ensure the stability of the etcd cluster nodes. It is more convenient to complete the rolling upgrade of the cluster with the program and reduce the maintenance burden.

The second is to run etcd in the form of static pod on Kubernetes Master, and cluster the etcd on multiple Kubernetes Master. In this mode, the etcd instances of each server are registered in Kubernetes, although it is not possible to use kubectl to manage these instances directly, but the monitoring and log collection components can work normally. Etcd running in this mode is more manageable.

The third is to use the self-hosted etcd scheme put forward by CoreOS to run etcd, which should provide services for Kubernetes at the bottom, on Kubernetes. Realize the management of its dependent components by Kubernetes. Etcd clusters in this mode can directly use etcd-operator to automate operation and maintenance, which is most in line with the usage habits of Kubernetes.

These three ideas can achieve the goal of high availability of etcd, but some judgments should be made according to the actual situation in the selection process. To put it simply, project option 1, which has a sufficient budget but is conservative, and option 3, which is in place in one step and is willing to take certain risks. Make a compromise and choose option two. The pros and cons of each plan and the trade-offs in the selection process will not be carried out in detail here. Friends who have questions about this area can contact and communicate in private.

High availability of kube-apiserver

Apiserver itself is a stateless service, so it is relatively easy to achieve its high availability, but the difficulty is how to expose apiserver running on multiple servers to all Node nodes with a unified external entry.

It is difficult, in fact, for the high availability of such stateless services, we have accumulated a lot of experience in designing high availability solutions for business systems. It should be noted that the SSL certificate used by apiserver should contain the address of the external portal, otherwise the Node node will not be able to access the apiserver properly.

There are also three basic ideas for apiserver's high availability:

One is to use external load balancer, whether it is using the load balancer service provided by the public cloud or using LVS or HaProxy self-built load balancer in the private cloud. Load balancer is a very mature solution, which is skipped and not introduced too much here. How to ensure the high availability of the load balancer is a new problem to be considered in choosing this scheme.

The second is to do load balancing at the network layer. For example, you can do ECMP with BGP on the Master node, or NAT with iptables on the Node node. The adoption of this scheme does not require additional external services, but there are certain requirements for network configuration.

The third is to use reverse proxy to load balance multiple Master on the Node node. This scheme also does not need to rely on external components, but when Master nodes increase or decrease, how to dynamically configure load balancers on Node nodes becomes another problem that needs to be solved.

From the current selection of various cluster management tools, these three models have been used, and there is no clear recommendation. It is recommended to consider the first mode in the cluster on the public cloud. Since maintaining additional load balancers in the private cloud environment is also a burden, it is recommended to consider the second or third option.

High availability of kube-controller-manager and kube-scheduler

These two services are part of the Master node, and their high availability is relatively easy, requiring only multiple instances to run. These instances will leader election by adding locks to the Endpoint in apiserver. When the current instance of leader does not work properly, other instances will get the lock and become the new leader.

At present, it is common to deploy these two services in static pod mode on multiple Master nodes. In a radical way, you can also use self-hosted mode to deploy these two services on Kubernetes using DaemonSet or Deployment.

High availability of Kube-dns

Strictly speaking, kube-dns is not part of the Master component, because it can run on Node nodes and use Service to provide services to the cluster. However, in the actual environment, because there is only one kube-dns instance running by default, the dns service within the cluster will be unavailable when it is upgraded or the node is down, and the normal operation of the online service will be affected in serious cases.

To avoid failures, set the replicas value of kube-dns to 2 or more and deploy them on different Node nodes using anti-affinity. This operation is easy to be overlooked, and it is not until there is a failure that it is found that only one instance of kube-dns is running.

The strategies that can be adopted for the high availability of the various components of Kubernetes Master are described above. The high availability of etcd and kube-apiserver is the focus of the whole scheme. Because there are a variety of highly available scenarios, cluster administrators should choose the appropriate one according to the environment of the cluster and other constraints.

This kind of situation that there is no absolute general scheme and requires cluster builders to choose among multiple schemes according to different current situation appears frequently in the process of Kubernetes cluster construction, and it is also the most challenging part in the whole construction process. The selection of container network solution as another big problem in the process of Kubernetes construction also belongs to this situation, and we will have the opportunity to share this topic again in the future.

In the actual construction process, after the completion of the high availability of the above four components, it is best to take the actual shutdown test to verify the reliability of the high availability scheme, and constantly adjust and optimize the whole scheme according to the test results.

In addition, the combination of the high availability scheme and the system automation upgrade scheme to realize the automatic upgrade of the system under high availability will greatly reduce the daily operation and maintenance burden of the cluster, which is worth investing in the research.

Although this article is mainly about the high availability of Kubernetes Master, it should be pointed out that high availability is not necessary, the cost of achieving high availability is not low, and the corresponding benefits need to be balanced. For a large number of small-scale clusters, the business system does not achieve high availability, and the benefits of rashly doing clusters are limited. At this time, it is a rational choice to use the scheme of single Master node to do a good job of etcd data backup.

These are the highly available Kubernetes Master strategies shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.