Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the state of Pod when Node is abnormal in Kubernetes

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what the Pod state is when Node is abnormal in Kubernetes". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what the Pod state is like when Node is abnormal in Kubernetes".

Kubelet process exception, Pod status change

If pod is running on a node, stop the kubelet process at this time. Will the pod inside be killed? Will it recreate on other nodes?

Conclusion:

(1) the state of Node changes to NotReady. (2) the state of Pod does not change within 5 minutes, but changes after 5 minutes: the Pod state of Daemonset changes to the state of Nodelost,Deployment, Statefulset and Static Pod, first to NodeLost, and then immediately to Unknown. The pod of Deployment will recreate, but if Deployment stops the node of kubelet, the pod of recreate will always be in the state of Pending. The Pod for Static Pod and Statefulset will always be in the Unknown state.

Kubelet recovery, Pod behavior

What will happen to node and pod if kubelet gets up again in 10 minutes?

Conclusion:

(1) the Node status changes to Ready. (2) the pod of Daemonset will not recreate, and the state of the old pod will directly change to Running. (3) Deployment deletes the Node where the kubelet process stops (probably because the old Pod status changes in the cluster, but when the Pod status changes, it is found that there are enough Pod instances of Deployment in the cluster, so delete the old Pod) (4) the Pod of Statefulset will re-recreate. (5) Staic Pod is not restarted, but the run time of Pod will be set to 0 when kubelet is up.

After kubelet stops, the pod of statefulset becomes nodelost, then becomes unknown, but it does not restart, and then statefulset's pod does not recreate until kubelet is up.

Another is that Static Pod should not restart after kubelet restart, but when querying the status of Static Pod in the cluster, the running time of Static Pod has changed.

Why does StatefulSet Pod have no Recreate when Node exception

StatefulSet Pods has not been rebuilt after Node down. Why?

We found in node controller that delete pod api is called to delete pod except daemonset pods.

However, it is not that calling delete pod api will delete the pod object from the apiserver/etcd, just set the deletionTimestamp of the pod and mark that the pod is to be deleted. The real act of deleting a Pod is to kubelet,kubelet grace terminate the pod and then actually delete the pod object. At this time, statefulset controller will go to recreate the pod when it finds that some replica is missing.

However, at this time, because kubelet hung up and could not communicate with master, Pod Object could not be deleted from etcd. If you can successfully delete the Pod Object, you can rebuild the Pod in another Node.

In addition, note that statefulset only targets isFailed Pod (but now Pods is in Unkown state) before going to delete Pod.

/ / delete and recreate failed pods if isFailed (replicas [I]) {ssc.recorder.Eventf (set, v1.EventTypeWarning, "RecreatingFailedPod", "StatefulSetPlus% s is recreating failed Pod% s", set.Namespace, set.Name Replicas [I] .Name) if err: = ssc.podControl.DeleteStatefulPlusPod (set, replicas [I]) Err! = nil {return & status Err} if getPodRevision (replicas [I]) = = currentRevision.Name {status.CurrentReplicas-} if getPodRevision (replicas [I]) = = updateRevision.Name {status.UpdatedReplicas- } status.Replicas- replicas [I] = newVersionedStatefulSetPlusPod (currentSet UpdateSet, currentRevision.Name, updateRevision.Name, I)} optimize the behavior of StatefulSet Pod

Therefore, in case of node exception, the guarantee of stateful application (Non-Quorum) should supplement the following behaviors:

Monitor whether the network, kubelet process and operating system of node are abnormal, and treat them differently.

For example, if the Pod cannot provide service normally because of a network exception, then kubectl delete pod-f-grace-period=0 is required to forcibly delete the pod from the etcd.

After a forced deletion, statefulset controller automatically triggers recreate pod on other Node.

Or, a more aggressive approach is to discard the GracePeriodSeconds,StatefulSet Pod GracePeriodSeconds as nil or 0, and the object will be deleted directly from the etcd.

/ / BeforeDelete tests whether the object can be gracefully deleted.// If graceful is set, the object should be gracefully deleted. If gracefulPending// is set, the object has already been gracefully deleted (and the provided grace// period is longer than the time to deletion). An error is returned if the// condition cannot be checked or the gracePeriodSeconds is invalid. The options// argument may be updated with default values if graceful is true. Second place// where we set deletionTimestamp is pkg/registry/generic/registry/store.go.// This function is responsible for setting deletionTimestamp during gracefulDeletion,// other one for cascading deletions.func BeforeDelete (strategy RESTDeleteStrategy, ctx context.Context, obj runtime.Object, options * metav1.DeleteOptions) (graceful, gracefulPending bool, err error) {objectMeta, gvk, kerr: = objectMetaAndKind (strategy, obj) if kerr! = nil {return false, false Kerr} if errs: = validation.ValidateDeleteOptions (options) Len (errs) > 0 {return false, false, errors.NewInvalid (schema.GroupKind {Group: metav1.GroupName, Kind: "DeleteOptions"}, ", errs)} / / Checking the Preconditions here to fail early. They'll be enforced later on when we actually do the deletion, too. If options.Preconditions! = nil & & options.Preconditions.UID! = nil & & * options.Preconditions.UID! = objectMeta.GetUID () {return false, false, errors.NewConflict (schema.GroupResource {Group: gvk.Group, Resource: gvk.Kind}, objectMeta.GetName (), fmt.Errorf ("the UID in the precondition (% s) does not match the UID in record (% s). The object might have been deleted and then recreated ", * options.Preconditions.UID, objectMeta.GetUID ())} gracefulStrategy, ok: = strategy. (RESTGracefulDeleteStrategy) if! ok {/ / If we're not deleting gracefully there's no point in updating Generation, as we won't update / / the obcject before deleting it. Return false, false, nil} / / if the object is already being deleted, no need to update generation. If objectMeta.GetDeletionTimestamp ()! = nil {/ / if we are already being deleted, we may only shorten the deletion grace period / / this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set, / so we force deletion immediately / / IMPORTANT: / / The deletion operation happens in two phases. / / 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp / / 2. Delete the object from storage. / If the update succeeds, but the delete fails (network error, internal storage error, etc.), / / a resource was previously left in a state that was non-recoverable. We / / check if the existing stored resource has a grace period as 0 and if so / / attempt to delete immediately in order to recover from this scenario. If objectMeta.GetDeletionGracePeriodSeconds () = = nil | | * objectMeta.GetDeletionGracePeriodSeconds () = = 0 {return false, false, nil}.}. Thank you for reading. This is the content of "what the Pod status is when Node is abnormal in Kubernetes". After the study of this article I believe that you have a deeper understanding of what the Pod state is when Node is abnormal in Kubernetes, and the specific usage needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report