In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
This article is about why Namespace in K8s can't be deleted. The editor thought it was very practical, so I shared it with you as a reference. Let's follow the editor and have a look.
Namespace itself is a resource. Through the cluster API Server entry, we can create a new Namespace, and for Namespace that is no longer in use, we need to clean it up. Namespace's Controller monitors changes in the Namespace in the cluster through API Server, and then performs predefined actions based on the changes.
Sometimes we encounter the problem in the following figure, that is, the status of Namespace is marked "Terminating", but it cannot be deleted completely.
Start from the cluster entrance
Because the delete operation is performed through the cluster API Server, we analyze the behavior of the API Server. Like most cluster components, API Server provides different levels of log output. To understand the behavior of API Server, we adjust the log level to the highest level. Then, recreate the problem by creating the Namespace to delete tobedeletedb.
Unfortunately, API Server does not output too many logs related to this problem.
The related logs can be divided into two parts:
One part is the deleted record of Namespace, which shows that the client tool is kubectl, and the source IP address that initiated the operation is 192.168.0.41, which is as expected; the other part is that Kube Controller Manager is repeatedly getting the information about this Namespace.
Kube Controller Manager implements most of the Controller in the cluster, and it is repeatedly getting the information of the tobedeletedb. Basically, it can be judged that it is the Controller of the Namespace that is getting the information of the Namespace.
What is Controller doing?
Similar to the previous section, we studied the behavior of this component by opening the Kube Controller Manager highest-level log. In Kube Controller Manager's log, you can see that Namespace's Controller is constantly trying a failed operation, which is to clean up the "collected" resources in tobedeletedb, the Namespace.
How to delete the resources in the storage box?
One thing we need to understand here is that Namespace, as a "storage box" of resources, is actually a logical concept. It is not like the real storage tool, which can store small objects. The "reception" of Namespace is actually a kind of mapping relationship.
This is important because it directly determines how resources within Namespace are deleted. If it is a physical sense of "storage", then we only need to delete the "storage box", and the resources inside will be deleted as well. For logical relationships, we need to list all the resources and delete those that point to the Namespace that needs to be deleted.
API 、 Group 、 Version
How to list all the resources in the cluster? This problem needs to start with the way the cluster API is organized. The API of K8s cluster is not monolithic, it is organized by grouping and version. The obvious advantage of this is that the API of different groups can iterate independently without affecting each other. Common groupings, such as apps, are available in v1, v1beta1, and v1beta2. The complete grouping / version list can be seen using the kubectl api-versions command.
Every resource we create must belong to a certain API grouping / version. Taking the following Ingress as an example, we specify that the grouping / version of the Ingress resource is networking.k8s.io/v1beta1.
Kind: Ingressmetadata: name: test-ingressspec: rules:-http: paths:-path: / testpath backend: serviceName: test servicePort: 80
Summarize the API grouping and version with a simple diagram.
In fact, there are many API groups / versions in the cluster, and each API group / version supports specific resource types. When we orchestrate resources through yaml, we need to specify the resource type kind and the API grouping / version apiVersion. To list resources, we need to get a list of API groupings / versions.
Why can't Controller delete resources in Namespace?
Once you understand the concept of API grouping / version, it becomes clear to you to look back at Kube Controller Manager's log. Obviously Namespace's Controller is trying to get the API grouping / version list, and when it encounters metrics.k8s.io/v1beta1, the query fails. And the reason for the query failure is "the server is currently unable to handle the request".
Go back to the cluster entrance again
In the previous section, we found that Kube Controller Manager failed to get the API grouping / version of metrics.k8s.io/v1beta1. And this query request is obviously addressed to API Server. So let's go back to the API Server log and analyze the metrics.k8s.io/v1beta1-related records. At the same point in time, we saw that API Server reported the same error "the server is currently unable to handle the request".
Obviously, there is a contradiction here, that is, API Server is obviously working properly, why does it return that Server is not available when getting the grouped version of metrics.k8s.io/v1beta1 API? To answer this question, we need to understand the "plug-in" mechanism of API Server.
Cluster API Server has a mechanism to extend itself, and developers can use this mechanism to implement the "plug-in" of API Server. The main function of this "plug-in" is to implement the new API grouping / version. As a proxy, API Server will forward the corresponding API call to its own "plug-in".
Take Metrics Server as an example, it implements the API grouping / version of metrics.k8s.io/v1beta1. All calls to this packet / version are forwarded to Metrics Server. As shown in the figure below, the implementation of Metrics Server mainly uses a service and a pod.
The last apiservice in the picture above is the mechanism that connects "plug-in" to API Server. You can see the detailed definition of the apiservice in the following figure. It includes the grouping / version of API and the name of the service that implements Metrics Server. With this information, API Server can forward calls to metrics.k8s.io/v1beta1 to Metrics Server.
Communication between nodes and Pod
After a simple test, we found that this problem is actually a communication problem between API server and metrics server pod. In Aliyun K8s cluster environment, API Server uses host network, that is, ECS network, while Metrics Server uses Pod network. The communication between the two depends on the forwarding of the VPC routing table.
As an example in the figure above, if API Server runs on Node A, its IP address is 192.168.0.193. Assuming that the IP of Metrics Server is 172.16.1.12, the network connection from API Server to Metrics Server must be forwarded through the second routing rule in the VPC routing table.
Check the cluster VPC routing table and find that the routing table entry pointing to the node where Metrics Server is located is missing, so there is a problem with the communication between API server and Metrics Server.
Why isn't Route Controller working?
In order to maintain the correctness of the cluster VPC routing table entries, Aliyun implements Route Controller within the Cloud Controller Manager. This Controller monitors the cluster node status and the VPC routing table status at all times. When it is found that the routing table entry is missing, it will automatically fill in the missing routing table entry.
The current situation is obviously not in line with expectations, and Route Controller is obviously not working properly. This can be confirmed by looking at the Cloud Controller Manager log. In the log, we found that when Route Controller uses the cluster VPC id to find the VPC instance, there is no way to get the information about the instance.
But the cluster is still there and the ECS is still there, so VPC can't be gone. We can confirm this on the VPC console through VPC id. So the next question is, why can't Cloud Controller Manager get the information about this VPC?
Cluster nodes access cloud resources
Cloud Controller Manager acquires VPC information by opening API to Aliyun. This is basically equivalent to getting the information of a VPC instance from inside an ECS on the cloud, and this requires sufficient permissions for the ECS. The current general practice is to grant the RAM role to the ECS server and bind the corresponding RAM role to the corresponding role authorization.
If the cluster component, as its node, cannot obtain information about cloud resources, there are basically two possibilities. First, ECS is not bound to the correct RAM role; second, the RAM role authorization bound to the RAM role does not define the correct authorization rules. Examining the RAM role of the node and the authorization managed by the RAM role, we find that the authorization policy for vpc has been changed.
When we changed the Effect to Allow, it wasn't long before all the namespace in the Terminating state disappeared.
Big picture of the problem
Overall, this problem is related to the six components of the K8s cluster, namely API Server and its extensions Metrics Server,Namespace Controller and Route Controller, as well as VPC routing table and RAM role authorization.
By analyzing the behavior of the first three components, we locate that the cluster network problem caused the API Server to be unable to connect to the Metrics Server;. By troubleshooting the last three components, we found that the root cause of the problem was that the VPC routing table was deleted and the RAM role authorization policy was changed.
After reading the above, do you have any further understanding of the reason why the Namespace in K8s cannot be deleted? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel. Thank you for reading.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.