In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "what is the source code of Kubernetes StatefulSet". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Inner Structure
The following is a simple internal structure diagram of StatefulSet Controller work.
NewStatefulSetController
Like other Controller, StatefulSet Controller is started by ControllerManager at initialization time.
/ / NewStatefulSetController creates a new statefulset controller.func NewStatefulSetController (podInformer coreinformers.PodInformer, setInformer appsinformers.StatefulSetInformer, pvcInformer coreinformers.PersistentVolumeClaimInformer, revInformer appsinformers.ControllerRevisionInformer, kubeClient clientset.Interface,) * StatefulSetController {... Ssc: & StatefulSetController {kubeClient: kubeClient, control: NewDefaultStatefulSetControl (NewRealStatefulPodControl (kubeClient, setInformer.Lister (), podInformer.Lister (), pvcInformer.Lister ()) Recorder), NewRealStatefulSetStatusUpdater (kubeClient, setInformer.Lister ()), history.NewHistory (kubeClient, revInformer.Lister ()), pvcListerSynced: pvcInformer.Informer () .HasSynced, queue: workqueue.NewNamedRateLimitingQueue (workqueue.DefaultControllerRateLimiter (), "statefulset") PodControl: controller.RealPodControl {KubeClient: kubeClient, Recorder: recorder}, revListerSynced: revInformer.Informer () .HasSynced,} podInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {/ / lookup the statefulset and enqueue AddFunc: ssc.addPod, / / lookup current and old statefulset if labels changed UpdateFunc: ssc.updatePod / / lookup statefulset accounting for deletion tombstones DeleteFunc: ssc.deletePod,}) ssc.podLister = podInformer.Lister () ssc.podListerSynced = podInformer.Informer () .HasSynced setInformer.Informer () .AddEventHandlerWithResyncPeriod (cache.ResourceEventHandlerFuncs {AddFunc: ssc.enqueueStatefulSet, UpdateFunc: func (old) Cur interface {}) {oldPS: = old. (* apps.StatefulSet) curPS: = cur. (* apps.StatefulSet) if oldPS.Status.Replicas! = curPS.Status.Replicas {glog.V (4) .Infof ("Observed updated replica count for StatefulSet:% v" % d->% d ", curPS.Name, oldPS.Status.Replicas, curPS.Status.Replicas)} ssc.enqueueStatefulSet (cur)}, DeleteFunc: ssc.enqueueStatefulSet,}, statefulSetResyncPeriod ) ssc.setLister = setInformer.Lister () ssc.setListerSynced = setInformer.Informer () .HasSynced / / TODO: Watch volumes return ssc}
The familiar code style is to create the corresponding eventBroadcaster, and then register the corresponding eventHandler to the corresponding objectInformer:
StatefulSetController main ListWatch Pod and StatefulSet objects
Pod Informer registers add/update/delete EventHandler, and all three EventHandler will add the StatefulSet corresponding to Pod to the StatefulSet Queue.
StatefulSet Informer also registers add/update/event EventHandler and adds StatefulSet to StatefulSet Queue.
Currently, StatefulSetController is not aware of the EventHandler of PVC Informer. Here, we continue to process all of them according to PVC Controller. When StatefulSet Controller creates and deletes a Pod, apiserver is called to create and delete the corresponding PVC.
Similar to RevisionController, the corresponding Revision is created or deleted when StatefulSet Controller Reconcile.
StatefulSetController sync
Next, you enter the worker of StatefulSetController (there is only one worker, that is, only one go routine), and worker pop out a StatefulSet object from StatefulSet Queue, and then execute sync to Reconcile.
/ / sync syncs the given statefulset.func (ssc * StatefulSetController) sync (key string) error {startTime: = time.Now () defer func () {glog.V (4). Infof ("Finished syncing statefulset% Q (% v)", key, time.Now (). Sub (startTime))} () namespace, name Err: = cache.SplitMetaNamespaceKey (key) if err! = nil {return err} set, err: = ssc.setLister.StatefulSets (namespace) .Get (name) if errors.IsNotFound (err) {glog.Infof ("StatefulSet has been deleted% v" Key) return nil} if err! = nil {utilruntime.HandleError (fmt.Errorf ("unable to retrieve StatefulSet% v from store:% v", key, err)) return err} selector Err: = metav1.LabelSelectorAsSelector (set.Spec.Selector) if err! = nil {utilruntime.HandleError (fmt.Errorf ("error converting StatefulSet% v selector:% v", key, err)) / / This is a non-transient error, so don't retry. Return nil} if err: = ssc.adoptOrphanRevisions (set); err! = nil {return err} pods, err: = ssc.getPodsForStatefulSet (set, selector) if err! = nil {return err} return ssc.syncStatefulSet (set, pods)}
Sync matches all the revisions based on setLabel, and then checks to see if any of these revisions are OwnerReference empty. If so, it means that there is a Revisions for Orphaned.
Note: as soon as a History Revision is detected, it will trigger a Patch to all Resivions:
{"metadata": {"ownerReferences": [{"apiVersion": "% s", "kind": "% s", "name": "% s", "uid": "% s", "controller": true, "blockOwnerDeletion": true}], "uid": "% s"}}
Call getPodsForStatefulSet to get the Pods that this StatefulSet should manage.
Get all the Pods under the Namesapce corresponding to the StatefulSet
Perform ClaimPods operation: check whether the Label of set and pod match, if the Label does not match, then need the Pod of release, and then check that the format of name and StatefulSet name of pod can match. For those that match and the ControllerRef UID is the same, it does not need to be processed.
If Selector and ControllerRef do not match, perform the ReleasePod operation and call Pod Patch: {"metadata": {"ownerReferences": [{"$patch": "delete", "uid": "% s"}], "uid": "% s"}}
For Pods that matches Label and name formats, but controllerRef is empty, execute AdoptPod and type Pod with Patch: {"metadata": {"ownerReferences": [{"apiVersion": "% s", "kind": "% s", "name": "% s", "uid": "% s", "controller": true, "blockOwnerDeletion": true}], "uid": "% s"}.
UpdateStatefulSet
The implementation of syncStatefulSet simply calls UpdateStatefulSet.
Func (ssc * defaultStatefulSetControl) UpdateStatefulSet (set * apps.StatefulSet, pods [] * v1.Pod) error {/ / list all revisions and sort them revisions, err: = ssc.ListRevisions (set) if err! = nil {return err} history.SortControllerRevisions (revisions) / / get the current, and update revisions currentRevision, updateRevision, collisionCount, err: = ssc.getStatefulSetRevisions (set Revisions) if err! = nil {return err} / / perform the main update function and get the status status, err: = ssc.updateStatefulSet (set, currentRevision, updateRevision, collisionCount, pods) if err! = nil {return err} / / update the set's status err = ssc.updateStatefulSetStatus (set Status) if err! = nil {return err} glog.V (4). Infof ("StatefulSet% s pod status replicas=%d ready=%d current=%d updated=%d", set.Namespace, set.Name, status.Replicas, status.ReadyReplicas, status.CurrentReplicas Status.UpdatedReplicas) glog.V (4). Infof ("StatefulSet% s revisions current=%s update=%s% s revisions current=%s update=%s", set.Namespace, set.Name, status.CurrentRevision, status.UpdateRevision) / / maintain the set's revision history limit return ssc.truncateHistory (set, pods, revisions, currentRevision, updateRevision)}
The main processes of UpdateStatefulSet are:
ListRevisions gets all the Revisions of the StatefulSet and sorts them by Revision from smallest to largest.
GetStatefulSetRevisions gets currentRevison and UpdateRevision.
Only when the Partition of RollingUpdate policy is not 0 will some Pods be updateRevision.
In other cases, all Pods have to maintain currentRevision.
UpdateStatefulSet is the core logic of StatefulSet Controller, which is responsible for creating, updating, and deleting Pods so that declarative target can be maintained:
So that target state always has Spec.Replicas Pods of Running And Ready.
If the update policy is RollingUpdate and Partition is 0, ensure that all Pods correspond to Status.CurrentRevision.
If the update policy is RollingUpdate and Partition is not 0, Pods with ordinal less than Partition maintains Status.CurrentRevision, while Pods with ordinal greater than or equal to Partition updates to Status.UpdateRevision.
If the update policy is OnDelete, the update of the corresponding Pods will only be triggered when the Pods is deleted, that is, it is not associated with the Revisions.
The number of History Revision maintained by truncateHistory does not exceed .Spec.RevisionHistoryLimit.
UpdateStatefulSet
UpdateStatefulSet is the core of the whole StatefulSetController.
Func (ssc * defaultStatefulSetControl) updateStatefulSet (set * apps.StatefulSet, currentRevision * apps.ControllerRevision, updateRevision * apps.ControllerRevision, collisionCount int32, pods [] * v1.Pod) (* apps.StatefulSetStatus, error) {/ / get the current and update revisions of the set. CurrentSet, err: = ApplyRevision (set, currentRevision) if err! = nil {return nil, err} updateSet, err: = ApplyRevision (set, updateRevision) if err! = nil {return nil, err} / / set the generation And revisions in the returned status status: = apps.StatefulSetStatus {} status.ObservedGeneration = new (int64) * status.ObservedGeneration = set.Generation status.CurrentRevision = currentRevision.Name status.UpdateRevision = updateRevision.Name status.CollisionCount = new (int32) * status.CollisionCount = collisionCount replicaCount: = int (* set.Spec.Replicas) / / slice that will contain all Pods such that 0 = 0 Target-- {/ / wait for terminating pods to expire if isTerminating (condemned [target]) {glog.V (4). Infof ("StatefulSet% s s to Terminate prior to scale down% s is waiting for Pod% target", set.Namespace Set.Name, condemned [target] .Name) / / block if we are in monotonic mode if monotonic {return & status Nil} continue} / / if we are in monotonic mode and the condemned target is not the first unhealthy Pod block if! isRunningAndReady & & monotonic & & condemned [target]! = firstUnhealthyPod {glog.V (4) .Infof ( "StatefulSet% s is waiting for Pod% s to be Running and Ready prior to scale down" Set.Namespace, set.Name, firstUnhealthyPod.Name) return & status, nil} glog.V (4). Infof ("StatefulSet% s terminating Pod% s for scale dowm" Set.Namespace, set.Name, target [target] .Name) if err: = ssc.podControl.DeleteStatefulPod (set, condemned [target]) Err! = nil {return & status Err} if getPodRevision (condemned [target]) = = currentRevision.Name {status.CurrentReplicas--} else if getPodRevision = = updateRevision.Name {status.UpdatedReplicas--} if monotonic {return & status Nil}} / / for the OnDelete strategy we short circuit. Pods will be updated when they are manually deleted. If set.Spec.UpdateStrategy.Type = apps.OnDeleteStatefulSetStrategyType {return & status, nil} / / we compute the minimum ordinal of the target sequence for a destructive update based on the strategy. UpdateMin: = 0 if set.Spec.UpdateStrategy.RollingUpdate! = nil {updateMin = int (* set.Spec.UpdateStrategy.RollingUpdate.Partition)} / / we terminate the Pod with the largest ordinal that does not match the update revision. For target: = len (replicas)-1; target > = updateMin; target-- {/ / delete the Pod if it is not already terminating and does not match the update revision. If getPodRevision (replicas [target])! = updateRevision.Name & &! isTerminating (replicas [target]) {glog.V (4). Infof ("StatefulSet% s terminating Pod% s for update", set.Namespace, set.Name Replicas [target] .Name) err: = ssc.podControl.DeleteStatefulPod (set, replicas [target]) status.CurrentReplicas-- return & status Err} / / wait for unhealthy Pods on update if! isHealthy (replicas [target]) {glog.V (4). Infof ("StatefulSet% s is waiting for Pod% s to update", set.Namespace Set.Name, replicas [target] .Name) return & status, nil}} return & status, nil}
Main process:
Get the StatefulSet Object corresponding to currentRevision and updateRevision, and set generation,currentRevision, updateRevision and other information to StatefulSet status.
Divide the pods obtained by getPodsForStatefulSet into two slice:
Valid replicas slice:: 0 = 0 & & set.Name = = parent & & pod.Name = = getPodName (set, ordinal) & & pod.Namespace = = set.Namespace & & pod.Labels [apps.StatefulSetPodNameLabel] = = pod.Name}
Pod name and statefulset name content match.
Namespace match.
Pod's Label:statefulset.kubernetes.io/pod-name really matches Pod name.
Storage Match
In updateStatefulSet Reconcile, the Storage match is checked. How exactly does it match?
/ / storageMatches returns true if pod's Volumes cover the set of PersistentVolumeClaimsfunc storageMatches (set * apps.StatefulSet, pod * v1.Pod) bool {ordinal: = getOrdinal (pod) if ordinal < 0 {return false} volumes: = make (map [string] v1.Volume, len (pod.Spec.Volumes)) for _ Volume: = range pod.Spec.Volumes {volumes [volume.Name] = volume} for _, claim: = range set.Spec.VolumeClaimTemplates {volume Found: = volumes [claim.Name] if! found | | volume.VolumeSource.PersistentVolumeClaim = = nil | | volume.VolumeSource.PersistentVolumeClaim.ClaimName! = getPersistentVolumeClaimName (set, & claim, ordinal) {return false}} return true} Code Logic Diagram
Based on the above analysis, the following is a relatively complete code logic diagram of StatefulSetController. (images larger than 2MB are not supported, so they are not very clear, but they are basically mentioned above.)
There is an exception in the process of thinking rolling update.
There is a question left over from the analysis of Kubernetes StatefulSet in the previous blog post: what if a Pod update fails when StatefulSet is scrolling updates?
Through the analysis of the rolling update section of the above source code analysis, we know:
If UpdateStrategy Type is RollingUpdate, according to the configuration of Partition in RollingUpdate (Partition is equal to 0 if Partition is not set, which means all pods are scrolling updates), get updateMin as the minimum value of update replicas index interval, and traverse the order of valid replicas,index decreasing from maximum to updateMin:
If the pod revision is not updateRevision and is not being deleted, delete the pod, update the status.CurrentReplicas, and return status, and the process ends.
If the pod is not healthy, you will wait for it to become healthy, so you will return status directly here, and the process is over.
Once you know this, you will be able to answer this question. The answer is simple:
If the update policy is RollingUpdate, then during the rolling update process one by one, if the Pod has been unable to reach the Running and Ready state while updating an ordinal replica, then the entire rolling update process will Block here. The replicas that has not been updated will not trigger the update, and the replicas that has been updated successfully will keep the updated version, and there is no automatic rollback mechanism. On the next sync, if the Pod isFailed (pod.Status.Phase = Failed) is detected, the failed pod will be delete and recreate.
Where is it when podManagementPolicy is set to Parallel?
Question: podManagementPolicy: when is "Parallel" embodied? Scale? RollingUpdate?
The paragraph in updateStatefulSet in the previous code analysis-"traversing the pods in valid replicas to make sure that the pod of index in valid replicas is Running And Ready": if it is found that an ordinal replica should have been created but has not been created yet, create will be triggered. If podManagementPolicy is set to Parallel, other replicas that should be created will continue to delete then create instead of waiting for the previously created replicas to become Running and Ready.
The paragraph in updateStatefulSet in the previous code analysis-"iterate through the order of pods,index in condemned replicas to ensure that these pods are eventually deleted": podManagementPolicy is set to Parallel, and if you find that an ordinal replica is being deleted, continue to delete other replicas that should be deleted without waiting for the previously deleted replica to be rebuilt and become a Running and Ready state.
Therefore, Parallel is reflected in the following scenarios:
When initializing the deployment of StatefulSet, parallel create pods.
When cascading delete StatefulSet, parallel delete pods.
When Scale up, parallel create pods.
When Scale down, parallel delete pods.
When rolling updates, it will not be affected by the configuration of podManagementPolicy, according to the order of one by one, ordinal from large to small, to ensure the principle of the former Running and Ready, RollingUpdate.
This is the end of the content of "what is the Kubernetes StatefulSet source code?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.