What if Rancher cannot manage the cluster? 07/13 Update SLTechnology News&Howtos

What if Rancher cannot manage the cluster?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what to do if Rancher cannot manage the cluster". In the daily operation, I believe that many people have doubts about what to do when Rancher cannot manage the cluster. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubt that "Rancher can not manage the cluster". Next, please follow the editor to study!

Foreword

Most Rancher users prefer to create custom clusters by using Rancher Server. After the creation is completed, Rancher Server may not be able to manage the cluster for a variety of reasons, such as accidentally deleting Rancher Server or backing up data that cannot be restored. The usual solution to such problems is to restart a Rancher Server and import and manage the downstream business cluster, but this can lead to some "sequelae", such as nodes that cannot continue to expand the business cluster.

In order to eliminate the impact of this "after-effect", we can use the "custom" cluster created by RKE Rancher Server.

As you know, the "custom" cluster created by Rancher Server through UI, the back-end is implemented through RKE, so RKE (https://docs.rancher.cn/rke/) has the ability to manage the "custom" cluster created by Rancher Server.

Create and manage Kubernetes clusters through RKE, relying on three files:

Cluster.yml:RKE cluster profile

Kube_config_cluster.yml: this file contains authentication credentials to obtain all permissions of the cluster

Cluster.rkestate:Kubernetes cluster status file, which contains authentication credentials to obtain all permissions of the cluster

Therefore, as long as these three files can be obtained from the downstream business cluster, we can continue to manage the downstream business cluster combined with RKE binaries. The following describes in detail how to create a "custom" cluster through the RKE admin Rancher Server and expand the nodes of the cluster through RKE.

Demonstration environment

This article is only tested for Rancher v2.4.x and v2.5.x, and other versions may not be applicable.

For a better demonstration, this article will start with the creation of a "custom" cluster by Rancher Server, then "customize" the cluster through RKE, and finally demonstrate the ability of RKE to manage the cluster by adding a node through RKE.

Rancher Server (ip-172-31-2-203) can be started in the simplest docker run mode and create a "custom" cluster through UI. The cluster includes two nodes: ip-172-31-2-203and ip-172-31-1-111details are as follows:

# kubectl get nodesNAME STATUS ROLES AGE VERSIONip-172-31-1-111 Ready worker 2m2s v1.18.14ip-172-31-2-203 Ready controlplane,etcd,worker 3m23s v1.18.14RKE nanotube "custom" cluster

1. Shut down ip-172-31-8-56 to simulate Rancher Server failure. At this time, you cannot continue to manage the downstream cluster through Rancher Server.

2. Restore the kube_config_cluster.yml file of the downstream business cluster and run the following command on the controlplane node:

# docker run-- rm-- net=host\-v $(docker inspect kubelet-- format'{{range .Mounts}} {{if eq .destination "/ etc/kubernetes"}} {{.Source}} {{end}') / ssl:/etc/kubernetes/ssl:ro\-- entrypoint bash $(docker inspect $(docker images-Q-filter=label=io.cattle.agent=true)\-- format=' {{index .RepoTags 0}}'| tail-1) \-c 'kubectl-- kubeconfig / etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap\-n kube-system full-cluster-state\-o json | jq-r. Data.\ "full-cluster-state\" | jq\-r. CurrentState.certi ficatesBundle.\ "kube-admin\" .config | sed\-e "/ ^ [[: space:]] * server:/ slots. 6443\ "_"'> kubeconfig_admin.yaml

After the kubeconfig_admin.yaml is successfully exported, you can continue to operate the downstream business cluster using kubectl:

# kubectl-- kubeconfig kubeconfig_admin.yaml get nodesNAME STATUS ROLES AGE VERSIONip-172-31-1-111 Ready worker 32m v1.18.14ip-172-31-2-203 Ready controlplane,etcd,worker 34m v1.18.14

3. Restore the cluster.rkestate file of the downstream business cluster and run the following command on the controlplane node:

# docker run-- rm-- net=host\-v $(docker inspect kubelet\-- format'{{range .Mounts} {{if eq .destination "/ etc/kubernetes"}} {{.Source}} {{end}}) / ssl:/etc/kubernetes/ssl:ro\-- entrypoint bash $(docker inspect $(docker images-Q-- filter=label=org.label-schema.vcs-url= https://github.com/rancher/hyperkube.git)) \-- format=' {{index .RepoTags 0}}'| tail-1)\-c 'kubectl-- kubeconfig / etc/kubernetes/ssl/kubecfg-kube-node.yaml\-n kube-system get configmap full-cluster-state\-o json | jq-r .dat a.\ "full-cluster-state\" | jq-r.'\ > cluster.rkestate

4. Restore the cluster.yml files of the downstream business cluster

At present, I can't find a good way to recover the file automatically, but I can restore the cluster.yml file manually based on the cluster.rkestate that has been restored, because basically all the configuration needed for cluster.yml can be obtained from cluster.rkestate.

Get the configuration information of the cluster node from cluster.rkestate:

# cat cluster.rkestate | jq-r. PreferredState.rkeConfig.configuration [{"nodeName": "c-kfbjs:m-d3e75ad7a0ea", "address": "172.31.2.203", "port": "22", "internalAddress": "172.31.2.203", "role": ["etcd", "controlplane", "worker"], "hostnameOverride": "ip-172-31-2-203" "user": "root", "sshKeyPath": "~ / .ssh/id_rsa"}]

Write cluster.yml manually according to the node information provided by cluster.rkestate

# cat cluster.ymlnodes:-address: 172.31.2.203 hostname_override: ip-172-31-2-203 user: ubuntu role:-controlplane-etcd-worker-address: 172.31.1.111 hostname_override: ip-172-31-1-111user: ubuntu role:-worker-address: 172.31.5.186 hostname_override: ip-172-31-5-186 user: ubuntu Role:-workerkubernetes_version: v1.18.14-rancher1-1

There are several points to note in the above manually written cluster.yml:

You can only get the information of the * * controlplane (ip-172-31-2-203) node from the cluster.rkestate file. Because there is also a worker (ip-172 172-31-1-111) node in this example cluster, you need to manually complete the information of the worker node.

Ip-172-31-5-186in cluster.yaml is a new worker node, which is used to demonstrate the new node in RKE in the next step.

The node information obtained from cluster.rkestate is the root user, which needs to be modified to the user executed by RKE according to the actual demand. In this case, the ubuntu user.

Be sure to specify the kubernetes_version parameter of the original cluster, otherwise the cluster will be upgraded to the latest version of RKE's default Kubernetes.

In addition to the above, you can restore cluster.yml through the following script. Again, you need to modify the points mentioned above. The advantage of using this method is that you can recover the cluster.yml file more completely, so you don't have to demonstrate too much because of the limited space:

#! / bin/bashecho "Building cluster.yml..." echo "Working on Nodes..." echo 'nodes:' > cluster.ymlcat cluster.rkestate | grep-v nodeName | jq-r. AccounredState.rkeConfig.configuration | yq r-| sed's / ^ / /' |\ sed-e's pash address _ name _ "> > cluster.ymlecho" Working on services... "echo 'services:' > > cluster.ymlcat cluster.rkestate | jq-r. PreferredState.rkeConfig.services | yq r-| sed's / ^ /' > cluster.ymlecho" > "> cluster.ymlecho" Working on network... "echo 'network:' > > cluster.ymlcat cluster.rkestate | jq-r .SecretredState.rkeConfig.network | yq r-| sed's / ^ /' > cluster.ymlecho" > cluster.ymlecho Working on authentication... " "echo 'authentication:' > > cluster.ymlcat cluster.rkestate | jq-r. AccounredState.rkeConfig.authentication | yq r-| sed's / ^ /' > > cluster.ymlecho"> cluster.ymlecho" Working on systemImages... "echo 'system_images:' > > cluster.ymlcat cluster.rkestate | jq-r. SecretredState.rkeConfig.systemImages | yq r-| sed's / ^ /' > cluster.ymlecho" > > cluster.yml

5. Use RKE to add nodes to the original cluster.

So far, the configuration files cluster.yml and cluster.rkestate required by RKE have been restored. Next, you can use rke up to add * * worker (PRAUR 172-31-1-111) * * nodes to the cluster.

# rke upINFO [0000] Running RKE version: v1.2.4INFO [0000] Initiating Kubernetes clusterINFO [0000] [certificates] GenerateServingCertificate is disabled Checking if there are unused kubelet certificatesINFO [0000] [certificates] Generating admin certificates and kubeconfigINFO [0000] Successfully Deployed state file at [. / cluster.rkestate] INFO [0000] Building Kubernetes clusterINFO [0000] [dialer] Setup tunnel for host [172.31.2.203] INFO [0000] [dialer] Setup tunnel for host [172.31.1.111] INFO [0000] [dialer] Setup tunnel for host [172.31.5.186] .INFO [0090] [addons] no user addons definedINFO [0090] Finished building Kubernetes cluster successfully

After waiting for the cluster update to complete, get the node information again:

# kubectl-- kubeconfig kubeconfig_admin.yaml get nodesNAME STATUS ROLES AGE VERSIONip-172-31-1-111 Ready worker 8m6s v1.18.14ip-172-31-2-203 Ready controlplane,etcd,worker 9m27s v1.18.14ip-172-31-5-186 Ready worker 29s v1.18.14

You can see that a new worker (ip-172-31-5-186) node has been added, and the cluster version is still v1.18.14.

In the future, you can continue to manage custom clusters created through Rancher Server through RKE, whether it is new nodes, snapshots, or restores. It is almost indistinguishable from a cluster created directly through RKE.

Postscript

Although this article describes how to customize the cluster through RKE Rancher, the operation is complicated, especially the configuration of cluster.yml. If there is an error, it may lead to updates or errors in the entire cluster, so be sure to do more tests before using it.

At this point, the study on "what to do if Rancher cannot manage the cluster" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.