In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
The main content of this article is to explain "what is the role of kubernetes Volume". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the role of kubernetes Volume?"
VolumeBinder in Scheduler
VolumeBinder is a module in Kubernetes default scheduler.
Pkg/scheduler/volumebinder/volume_binder.go:33// VolumeBinder sets up the volume binding library and manages// the volume binding operations with a queue.type VolumeBinder struct {Binder persistentvolume.SchedulerVolumeBinder BindQueue * workqueue.Type}
It maintains a BindQueue,BindQueue of type FIFO that stores the Pods to be Volume Bind.
Binder (persistentvolume.SchedulerVolumeBinder) is a functional sub-module within PV Controller that is provided to scheduler to handle PV/PVC Binding and Dynamic Provisioning during scheduling.
SchedulerVolumeBinder
SchedulerVolumeBinder is used for the consideration of VolumeBind when scheduling, to ensure that the scheduled Node also meets the PV NodeAffinity needs of Pod, not only other Predicate Policies such as Resource Request. In fact, it is based on the VolumeBindingMode of StorageClass to decide whether to delay Bind PV for WaitForFirstConsumer, and then when schduler predicate waits and ensures that the all PVCs of Pod is successfully Bind to the PVs that meets the conditions, it will finally trigger Bind API to complete the Bind of Pod and Node.
Pkg/controller/volume/persistentvolume/scheduler_binder.go:58// SchedulerVolumeBinder is used by the scheduler to handle PVC/PV binding// and dynamic provisioning. The binding decisions are integrated into the pod scheduling// workflow so that the PV NodeAffinity is also considered along with the pod's other// scheduling requirements.//// This integrates into the existing default scheduler workflow as follows:// 1. The scheduler takes a Pod off the scheduler queue and processes it serially:// a. Invokes all predicate functions, parallelized across nodes. FindPodVolumes () is invoked here.// b. Invokes all priority functions. Future/TBD// c. Selects the best node for the Pod.// d. Cache the node selection for the Pod. (Assume phase) / / I. If PVC binding is required, cache in-memory only:// * Updated PV objects for prebinding to the corresponding PVCs.// * For the pod, which PVs need API updates.// AssumePodVolumes () is invoked here. Then BindPodVolumes () is called asynchronously by the// scheduler. After BindPodVolumes () is complete, the Pod is added back to the scheduler queue// to be processed again until all PVCs are bound.// ii. If PVC binding is not required, cache the Pod- > Node binding in the scheduler's pod cache,// and asynchronously bind the Pod to the Node. This is handled in the scheduler and not here.// 2. Once the assume operation is done, the scheduler processes the next Pod in the scheduler queue// while the actual binding operation occurs in the background.type SchedulerVolumeBinder interface {/ / FindPodVolumes checks if all of a Pod's PVCs can be satisfied by the node. / If a PVC is bound, it checks if the PV's NodeAffinity matches the Node. / / Otherwise, it tries to find an available PV to bind to the PVC. / It returns true if all of the Pod's PVCs have matching PVs or can be dynamic provisioned, / / and returns true if bound volumes satisfy the PV NodeAffinity. / This function is called by the volume binding scheduler predicate and can be called in parallel FindPodVolumes (pod * v1.Pod, node * v1.Node) (unboundVolumesSatisified, boundVolumesSatisfied bool, err error) / / AssumePodVolumes will: / / 1. Take the PV matches for unbound PVCs and update the PV cache assuming / / that the PV is prebound to the PVC. / / 2. Take the PVCs that need provisioning and update the PVC cache with related / / annotations set. / It returns true if all volumes are fully bound, and returns true if any volume binding/provisioning / / API operation needs to be done afterwards. / This function will modify assumedPod with the node name. / / This function is called serially. AssumePodVolumes (assumedPod * v1.Pod, nodeName string) (allFullyBound bool, bindingRequired bool, err error) / BindPodVolumes will: / / 1. Initiate the volume binding by making the API call to prebind the PV / / to its matching PVC. / / 2. Trigger the volume provisioning by making the API call to set related / / annotations on the PVC / This function can be called in parallel. BindPodVolumes (assumedPod * v1.Pod) error / / GetBindingsCache returns the cache used (if any) to store volume binding decisions. GetBindingsCache () PodBindingCache}
SchedulerVolumeBinder Interface includes the following three methods:
FindPodVolumes: this method is called by scheduler during VolumeBindingChecker predicate policy execution to check whether all the PVCs of the Pod can be satisfied by the Node. If the PVC has been successfully Bound, it will check whether the NodeAffinity of the corresponding PV matches the Node. If the PVC does not already have a Bound Bound, an attempt will be made to find out from the PV cache whether there is a suitable PV that can Bound with that PVC. The returned values unboundVolumesSatisified,boundVolumesSatisfied respectively indicate:
UnboundVolumesSatisified:bool,true indicates that all PVCs of Pod have been successfully Bound, or you can Dynamic Provisioned (local volume currently only supports static provisioned), otherwise false is returned.
BoundVolumesSatisfied:bool,true indicates that the Volumes that has been Bound can meet the NodeAffinity of PV.
AssumePodVolumes: this method is then executed when scheduler completes the predicate and priority scheduling logic. Find the appropriate PVs for the PVCs in the Pod that has not been Bound, and update the PV cache to complete the prebound operation of PVs and PVCs (for PVC that requires Dynamic Provisioning, add Annotation: "pv.kubernetes.io/bound-by-controller"). If the PVCs of Dynamic Provisioning is required, then update the relevant Annotations of these PVCs in PVC cache: "volume.alpha.kubernetes.io/selected-node=$nodeName", which is also equivalent to the prebound operation. The returned values allFullyBound,bindingRequired respectively indicate:
AllFullyBound:bool,true means that all PVCs corresponding to Pod have completed Bound, otherwise false is returned.
BindingRequired:bool,true indicates that there are still API operations for volume binding/provisioning that need to be done, otherwise false is returned.
BindPodVolumes: according to the information in podBindingCache, call API to complete the PreBind of PV,PVC, and then PV Controller watch to this event to complete the real Bound operation.
GetBindingsCache: returns the PodBindingCache content.
The initialization of VolumeBinder in Scheduler is done by volumebinder.NewVolumeBinder.
Pkg/scheduler/volumebinder/volume_binder.go:39// NewVolumeBinder sets up the volume binding library and binding queuefunc NewVolumeBinder (client clientset.Interface, pvcInformer coreinformers.PersistentVolumeClaimInformer, pvInformer coreinformers.PersistentVolumeInformer, storageClassInformer storageinformers.StorageClassInformer) * VolumeBinder {return & VolumeBinder {Binder: persistentvolume.NewVolumeBinder (client, pvcInformer, pvInformer, storageClassInformer), BindQueue: workqueue.NewNamed ("podsToBind"),}
Scheduler volumebinder.NewVolumeBinder is responsible for:
Call persistentvolume.NewVolumeBinder to complete the initialization of the Binder object, which requires pvInformer and pvcInformer,storageClassInformer.
Create a podsToBind BindQueue to hold the Pods FIFIO queue to be Bind.
Call volumebinder.NewVolumeBinder in Scheduler NewConfigFactory to complete its initialization, the very important part is to complete the initialization of pvcInformer, pvInformer, storageClassInformer, and then pass it to persistentvolume.NewVolumeBinder to complete the creation of Binder.
Pkg/scheduler/factory/factory.go:145// NewConfigFactory initializes the default implementation of a Configurator To encourage eventual privatization of the struct type, we only// return the interface.func NewConfigFactory (schedulerName string, client clientset.Interface, nodeInformer coreinformers.NodeInformer, podInformer coreinformers.PodInformer, pvInformer coreinformers.PersistentVolumeInformer, pvcInformer coreinformers.PersistentVolumeClaimInformer, replicationControllerInformer coreinformers.ReplicationControllerInformer, replicaSetInformer extensionsinformers.ReplicaSetInformer, statefulSetInformer appsinformers.StatefulSetInformer, serviceInformer coreinformers.ServiceInformer PdbInformer policyinformers.PodDisruptionBudgetInformer, storageClassInformer storageinformers.StorageClassInformer, hardPodAffinitySymmetricWeight int32, enableEquivalenceClassCache bool, disablePreemption bool,) scheduler.Configurator {stopEverything: = make (chan struct {}) schedulerCache: = schedulercache.New (30*time.Second StopEverything) / / storageClassInformer is only enabled through VolumeScheduling feature gate var storageClassLister storagelisters.StorageClassLister if storageClassInformer! = nil {storageClassLister = storageClassInformer.Lister ()}... / / On add and delete of PVs It will affect equivalence cache items / / related to persistent volume pvInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {/ / MaxPDVolumeCountPredicate: since it relies on the counts of PV. AddFunc: c.onPvAdd, UpdateFunc: c.onPvUpdate, DeleteFunc: c.onPvDelete,},) c.pVLister = pvInformer.Lister () / / This is for MaxPDVolumeCountPredicate: add/delete PVC will affect counts of PV when it is bound. PvcInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {AddFunc: c.onPvcAdd, UpdateFunc: c.onPvcUpdate, DeleteFunc: c.onPvcDelete,},) c.pVCLister = pvcInformer.Lister (). If utilfeature.DefaultFeatureGate.Enabled (features.VolumeScheduling) {/ / Setup volumebinder c.volumeBinder = volumebinder.NewVolumeBinder (client, pvcInformer, pvInformer, storageClassInformer) storageClassInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {AddFunc: c.onStorageClassAdd, DeleteFunc: c.onStorageClassDelete },)}. Return c}
The call to scheduler volumebinder.NewVolumeBinder is based on Enable VolumeScheduling Feature Gate.
VolumeBinder in PV Controller
As mentioned earlier, scheduler volumebinder.NewVolumeBinder is done through persistentvolume.NewVolumeBinder when initializing Binder, so here we will analyze persistentvolume.volumeBinder.
VolumeBinder in PV Contorller is the implementation of SchedulerVolumeBinder Interface mentioned earlier, which implements interfaces such as FindPodVolumes, AssumePodVolumes, BindPodVolumes, and GetBindingsCache.
Pkg/controller/volume/persistentvolume/scheduler_binder.go:96type volumeBinder struct {ctrl * PersistentVolumeController pvcCache PVCAssumeCache pvCache PVAssumeCache / / Stores binding decisions that were made in FindPodVolumes for use in AssumePodVolumes. / / AssumePodVolumes modifies the bindings again for use in BindPodVolumes. PodBindingCache PodBindingCache} pkg/controller/volume/persistentvolume/scheduler_binder.go:108// NewVolumeBinder sets up all the caches needed for the scheduler to make volume binding decisions.func NewVolumeBinder (kubeClient clientset.Interface, pvcInformer coreinformers.PersistentVolumeClaimInformer, pvInformer coreinformers.PersistentVolumeInformer, storageClassInformer storageinformers.StorageClassInformer) SchedulerVolumeBinder {/ / TODO: find better way... Ctrl: = & PersistentVolumeController {kubeClient: kubeClient, classLister: storageClassInformer.Lister (),} b: = & volumeBinder {ctrl: ctrl, pvcCache: NewPVCAssumeCache (pvcInformer.Informer ()), pvCache: NewPVAssumeCache (pvInformer.Informer ()), podBindingCache: NewPodBindingCache () } return b}
VolumeBinder struct mainly includes pvController instance, pvCache, pvcCache and podBindingCache.
The podBindingCache structure is what we need to pay attention to:
Pkg/controller/volume/persistentvolume/scheduler_binder_cache.go:48type podBindingCache struct {mutex sync.Mutex / / Key = pod name / / Value = nodeDecisions bindingDecisions map [string] nodeDecisions} / / Key = nodeName// Value = bindings & provisioned PVCs of the nodetype nodeDecisions Map [string] nodeDecision// A decision includes bindingInfo and provisioned PVCs of the nodetype nodeDecision struct {bindings [] * bindingInfo provisionings [] * v1.PersistentVolumeClaim} type bindingInfo struct {/ / Claim that needs to be bound pvc * v1.PersistentVolumeClaim / / Proposed PV to bind to this claim pv * v1.PersistentVolume}
VolumeBindingChecker Predicate
Complete the creation of the VolumeBinder during Scheduler NewConfigFactory, and then register the CheckVolumeBinding Predicate Policy with default scheduler. Note that the default execution of all predicate policies is in order:
PredicatesOrdering = [] string {CheckNodeConditionPred, CheckNodeUnschedulablePred, GeneralPred, HostNamePred, PodFitsHostPortsPred, MatchNodeSelectorPred, PodFitsResourcesPred, NoDiskConflictPred, PodToleratesNodeTaintsPred, PodToleratesNodeNoExecuteTaintsPred, CheckNodeLabelPresencePred, CheckServiceAffinityPred, MaxEBSVolumeCountPred, MaxGCEPDVolumeCountPred, MaxAzureDiskVolumeCountPred, CheckVolumeBindingPred, NoVolumeZoneConflictPred, CheckNodeMemoryPressurePred, CheckNodePIDPressurePred, CheckNodeDiskPressurePred, MatchInterPodAffinityPred}
VolumeBindingChecker.predicate is the corresponding predicate implementation.
Pkg/scheduler/algorithm/predicates/predicates.go:1680func (c * VolumeBindingChecker) predicate (pod * v1.Pod, meta algorithm.PredicateMetadata, nodeInfo * schedulercache.NodeInfo) (bool, [] algorithm.PredicateFailureReason, error) {if! utilfeature.DefaultFeatureGate.Enabled (features.VolumeScheduling) {return true, nil, nil} node: = nodeInfo.Node () if node = = nil {return false, nil Fmt.Errorf ("node not found")} unboundSatisfied, boundSatisfied, err: = c.binder.Binder.FindPodVolumes (pod, node) if err! = nil {return false, nil, err} failReasons: = [] algorithm.PredicateFailureReason {} if! boundSatisfied {glog.V (5). Infof ("Bound PVs not satisfied for pod% v impact% v, node% Q", pod.Namespace, pod.Name Node.Name) failReasons = append (failReasons, ErrVolumeNodeConflict)} if! unboundSatisfied {glog.V (5). Infof ("Couldn't find matching PVs for pod% v node% Q", pod.Namespace, pod.Name, node.Name) failReasons = append (failReasons, ErrVolumeBindConflict)} if len (failReasons) > 0 {return false, failReasons Nil} / / All volumes bound or matching PVs found for all unbound PVCs glog.V (5). Infof ("All PVCs found matches for pod% v node% Q", pod.Namespace, pod.Name, node.Name) return true, nil, nil}
You need to confirm the VolumeScheduling Feature Gate Enabled.
Call volumeBinder.FindPodVolumes to check whether all the PVCs of the Pod can be satisfied by the Node.
What happens if VolumeBindingChecker fails when scheduling?
What happens if VolumeBindingChecker.predicate fails? Students who are familiar with scheduler logic should know that scheduling failure will trigger MakeDefaultErrorFunc.
Func (c * configFactory) MakeDefaultErrorFunc (backoff * util.PodBackoff, podQueue core.SchedulingQueue) func (pod * v1.Pod, err error) {return func (pod * v1.Pod, err error) {. Backoff.Gc () / / Retry asynchronously. / / Note that this is extremely rudimentary and we need a more real error handling path. Go func () {defer runtime.HandleCrash () podID: = types.NamespacedName {Namespace: pod.Namespace, Name: pod.Name } origPod: = pod / / When pod priority is enabled, we would like to place an unschedulable / / pod in the unschedulable queue. This ensures that if the pod is nominated / / to run on a node, scheduler takes the pod into account when running / / predicates for the node. If! util.PodPriorityEnabled () {entry: = backoff.GetEntry (podID) if! entry.TryWait (backoff.MaxDuration ()) {glog.Warningf ("Request for pod% v already in flight, abandoning" PodID) return}} / / Get the pod again It may have changed/been scheduled already. GetBackoff: = initialGetBackoff for {pod, err: = c.client.CoreV1 () .Pods (podID.Namespace) .Get (podID.Name Metav1.GetOptions {}) if err = = nil {if len (pod.Spec.NodeName) = = 0 {podQueue.AddUnschedulableIfNotPresent (pod)} else { If c.volumeBinder! = nil {/ / Volume binder only wants to keep unassigned pods c.volumeBinder.DeletePodBindings (pod) }} break} if errors.IsNotFound (err) {glog.Warningf ("A pod v no longer exists" PodID) if c.volumeBinder! = nil {/ / Volume binder only wants to keep unassigned pods c.volumeBinder.DeletePodBindings (origPod)} Return glog.Errorf ("Error getting pod% v for retry:% v Retrying... ", podID, err) if getBackoff = getBackoff * 2; getBackoff > maximalGetBackoff {getBackoff = maximalGetBackoff} time.Sleep (getBackoff)}} ()}}
MakeDefaultErrorFunc will asynchronously retry the Pod that failed to schedule:
If the pod.Spec.NodeName is not empty and the volumeBinder is not empty (meaning Enable VolumeScheduling Feature Gate), then the bindingDecisions corresponding to that pod is removed from the podBindingCache by calling podBindingCache.DeleteBindings, because volumeBinder only processes unassigned pods.
If the Pod has been deleted by the API and the volumeBinder is not empty (meaning Enable VolumeScheduling Feature Gate), the same call to podBindingCache.DeleteBindings removes the corresponding bindingDecisions of the pod from the podBindingCache, because the volumeBinder only processes the unassigned pods.
What happens to the success of VolumeBindingChecker when scheduling?
The Event handler:deletePodFromSchedulingQueue that removes the pod from the unscheduled pod queue (which means the schedule is successful) is registered in the NewConfigFactory.
Pkg/scheduler/factory/factory.go:745func (c * configFactory) deletePodFromSchedulingQueue (obj interface {}) {var pod * v1.Pod... If err: = c.podQueue.Delete (pod); err! = nil {runtime.HandleError (fmt.Errorf ("unable to dequeue% T:% v", obj, err))} if c.volumeBinder! = nil {/ / Volume binder only wants to keep unassigned pods c.volumeBinder.DeletePodBindings (pod)}}
The processing logic of deletePodFromSchedulingQueue, in addition to removing pod from podQueue, if volumeBinder is not empty (which means Enable VolumeScheduling Feature Gate), you also need to call podBindingCache.DeleteBindings to remove the corresponding bindingDecisions of that pod from podBindingCache, just like MakeDefaultErrorFunc, because volumeBinder only handles unassigned pods.
VolumeBinder
Next, let's take a look at the implementation of the various interfaces of volumeBinder and when they are called.
FindPodVolumes
When I analyzed VolumeBindingChecker Predicate earlier, I saw that volumeBinder.FindPodVolumes was called.
FindPodVolumes is used to check whether all the PVCs of the Pod can be satisfied by the Node.
If the PVC has been successfully Bound, it will check whether the NodeAffinity of the corresponding PV matches the Node.
If the PVC does not already have a Bound Bound, an attempt will be made to find out from the PV cache whether there is a suitable PV that can Bound with that PVC. The returned values unboundVolumesSatisified,boundVolumesSatisfied respectively indicate:
UnboundVolumesSatisified:bool,true indicates that all PVCs of Pod have been successfully Bound, or you can Dynamic Provisioned (local volume currently only supports static provisioned), otherwise false is returned.
BoundVolumesSatisfied:bool,true indicates that the Volumes that has been Bound can meet the NodeAffinity of PV.
Pkg/controller/volume/persistentvolume/scheduler_binder.go:135// FindPodVolumes caches the matching PVs and PVCs to provision per node in podBindingCachefunc (b * volumeBinder) FindPodVolumes (pod * v1.Pod, node * v1.Node) (unboundVolumesSatisfied, boundVolumesSatisfied bool, err error) {podName: = getPodName (pod) / / Warning: Below log needs high verbosity as it can be printed several times (# 60933). Glog.V (5). Infof ("FindPodVolumes for pod% Q, node% Q", podName, node.Name) / / Initialize to true for pods that don't have volumes unboundVolumesSatisfied = true boundVolumesSatisfied = true / / The pod's volumes need to be processed in one call to avoid the race condition where / / volumes can get bound/provisioned in between calls. BoundClaims, claimsToBind, unboundClaimsImmediate, err: = b.getPodVolumes (pod) if err! = nil {return false, false, err} / / Immediate claims should be bound if len (unboundClaimsImmediate) > 0 {return false, false, fmt.Errorf ("pod has unbound immediate PersistentVolumeClaims")} / / Check PV node affinity on bound volumes if len (boundClaims) > 0 {boundVolumesSatisfied Err = b.checkBoundClaims (boundClaims, node, podName) if err! = nil {return false, false, err}} if len (claimsToBind) > 0 {var claimsToProvision [] * v1.PersistentVolumeClaim unboundVolumesSatisfied, claimsToProvision, err = b.findMatchingVolumes (pod, claimsToBind) Node) if err! = nil {return false, false, err} if utilfeature.DefaultFeatureGate.Enabled (features.DynamicProvisioningScheduling) {/ / Try to provision for unbound volumes if! unboundVolumesSatisfied {unboundVolumesSatisfied, err = b.checkVolumeProvisions (pod, claimsToProvision) Node) if err! = nil {return false, false, err}} return unboundVolumesSatisfied, boundVolumesSatisfied, nil}
Three important methods are called in FindPodVolumes:
GetPodVolumes: divide PVCs into boundClaims, unboundClaims and unboundClaimsImmediate.
CheckBoundClaims: if the boundClaims is not empty, whether the NodeAffinity of the PV of the checkBoundClaims Bound matches the Node Labels, and if the match is successful, the boundVolumesSatisfied is true.
FindMatchingVolumes: if the claimsToBind is not empty, call findMatchingVolumes to select the size smallestPV that matches the condition from the pvcache, and if the match is not successful, call checkVolumeProvisions to check whether it is dynamic provision.
Let's focus on getPodVolumes, findMatchingVolumes, and checkVolumeProvisions.
GetPodVolumespkg/controller/volume/persistentvolume/scheduler_binder.go:359// getPodVolumes returns a pod's PVCs separated into bound (including prebound), unbound with delayed binding,// and unbound with immediate bindingfunc (b * volumeBinder) getPodVolumes (pod * v1.Pod) (boundClaims [] * v1.PersistentVolumeClaim, unboundClaims [] * bindingInfo, unboundClaimsImmediate [] * v1.PersistentVolumeClaim) Err error) {boundClaims = [] * v1.PersistentVolumeClaim {} unboundClaimsImmediate = [] * v1.PersistentVolumeClaim {} unboundClaims = [] * bindingInfo {} for _, vol: = range pod.Spec.Volumes {volumeBound, pvc, err: = b.isVolumeBound (pod.Namespace, & vol, false) if err! = nil {return nil, nil, nil Err} if pvc = = nil {continue} if volumeBound {boundClaims = append (boundClaims, pvc)} else {delayBinding Err: = b.ctrl.shouldDelayBinding (pvc) if err! = nil {return nil, err} if delayBinding {/ / Scheduler path unboundClaims = append (unboundClaims & bindingInfo {pvc: pvc})} else {/ / Immediate binding should have already been bound unboundClaimsImmediate = append (unboundClaimsImmediate, pvc)} return boundClaims, unboundClaims, unboundClaimsImmediate, nil}
GetPodVolumes divides the PVCs of pod into three categories:
BoundClaims: PVCs that has been Bound, including prebound
UnboundClaims: unbound PVCs that requires delay binding
UnboundClaimsImmediate: unbound PVCs that requires immediate binding
So what kind of PVCs is delay binding? Let's look at the logic of shouldDelayBinding:
Func (ctrl * PersistentVolumeController) shouldDelayBinding (claim * v1.PersistentVolumeClaim) (bool, error) {if! utilfeature.DefaultFeatureGate.Enabled (features.VolumeScheduling) {return false, nil} if utilfeature.DefaultFeatureGate.Enabled (features.DynamicProvisioningScheduling) {/ / When feature DynamicProvisioningScheduling enabled / / Scheduler signal to the PV controller to start dynamic / / provisioning by setting the "annSelectedNode" annotation / / in the PVC if _, ok: = claim.Annotations [annSelectedNode] Ok {return false, nil}} className: = v1helper.GetPersistentVolumeClaimClass (claim) if className = = "" {return false, nil} class, err: = ctrl.classLister.Get (className) if err! = nil {return false Nil} if class.VolumeBindingMode = = nil {return false, fmt.Errorf ("VolumeBindingMode not set for StorageClass% Q", className)} return * class.VolumeBindingMode = = storage.VolumeBindingWaitForFirstConsumer, nil}
If VolumeScheduling Feature Gate Disenable, then PVC does not belong to deley binding.
If DynamicProvisioningScheduling Feature Gate Enable, check to see if the Annotation of the PVC contains "volume.alpha.kubernetes.io/selected-node", and if the Annotation is included, the PVC does not belong to delay binding.
If the storageClass corresponding to the PVC is empty or the storageClass does not exist, the PVC does not belong to the delay binding.
If the storageClass corresponding to the PVC exists and the VolumeBindingMode of the storageClass is empty, the PVC does not belong to the delay binding.
Only if the storageClass corresponding to PVC exists and the VolumeBindingMode of storageClass is WaitForFirstConsumer, the PVC belongs to delay binding.
FindMatchingVolumes
If the claimsToBind returned by getPodVolumes is not empty, the findMatchingVolumes is called to select the size smallestPV that matches the condition from the pvcache, and if the match is not successful, checkVolumeProvisions is called to check whether it is dynamic provision.
Pkg/controller/volume/persistentvolume/scheduler_binder.go:413// findMatchingVolumes tries to find matching volumes for given claims,// and return unbound claims for further provision.func (b * volumeBinder) findMatchingVolumes (pod * v1.Pod, claimsToBind [] * bindingInfo, node * v1.Node) (foundMatches bool, unboundClaims [] * v1.PersistentVolumeClaim Err error) {podName: = getPodName (pod) / / Sort all the claims by increasing size request to get the smallest fits sort.Sort (byPVCSize (claimsToBind)) chosenPVs: = map [string] * v1.PersistentVolume {} foundMatches = true matchedClaims: = [] * bindingInfo {} for _ BindingInfo: = range claimsToBind {/ / Get storage class name from each PVC storageClassName: = "" storageClass: = bindingInfo.pvc.Spec.StorageClassName if storageClass! = nil {storageClassName = * storageClass} allPVs: = b.pvCache.ListPVs (storageClassName) / / Find a matching PV bindingInfo.pv Err = findMatchingVolume (bindingInfo.pvc, allPVs, node, chosenPVs, true) if err! = nil {return false, nil, err} if bindingInfo.pv = = nil {glog.V (4). Infof ("No matching volumes for Pod% Q, PVC% q on node% Q", podName, getPVCName (bindingInfo.pvc) Node.Name) unboundClaims = append (unboundClaims, bindingInfo.pvc) foundMatches = false continue} / / matching PV needs to be excluded so we don't select it again chosenPVs [bindingInfo.pv.Name] = bindingInfo.pv matchedClaims = append (matchedClaims BindingInfo) glog.V (5) .Infof ("Found matching PV% q for PVC% q on node% q for pod Q", bindingInfo.pv.Name, getPVCName (bindingInfo.pvc), node.Name, podName)} / / Mark cache with all the matches for each PVC for this node if len (matchedClaims) > 0 {b.podBindingCache.UpdateBindings (pod, node.Name) MatchedClaims)} if foundMatches {glog.V (4) .Infof ("Found matching volumes for pod% q on node% Q", podName, node.Name)} return} checkVolumeProvisionspkg/controller/volume/persistentvolume/scheduler_binder.go:465// checkVolumeProvisions checks given unbound claims (the claims have gone through func// findMatchingVolumes, and do not have matching volumes for binding) And return true// if all of the claims are eligible for dynamic provision.func (b * volumeBinder) checkVolumeProvisions (pod * v1.Pod, claimsToProvision [] * v1.PersistentVolumeClaim, node * v1.Node) (provisionSatisfied bool, err error) {podName: = getPodName (pod) provisionedClaims: = [] * v1.PersistentVolumeClaim {} for _ Claim: = range claimsToProvision {className: = v1helper.GetPersistentVolumeClaimClass (claim) if className = "" {return false, fmt.Errorf ("no class for claim% Q", getPVCName (claim))} class, err: = b.ctrl.classLister.Get (className) if err! = nil {return false Fmt.Errorf ("failed to find storage class% Q", className)} provisioner: = class.Provisioner if provisioner = = "" | | provisioner = = notSupportedProvisioner {glog.V (4) .Infof ("storage class% q of claim% q does not support dynamic provisioning", className, getPVCName (claim)) return false Nil} / / Check if the node can satisfy the topology requirement in the class if! v1helper.MatchTopologySelectorTerms (class.AllowedTopologies, labels.Set (node.Labels)) {glog.V (4) .Infof ("Node% q cannot satisfy provisioning topology requirements of claim Q", node.Name, getPVCName (claim)) return false Nil} / / TODO: Check if capacity of the node domain in the storage class / / can satisfy resource requirement of given claim provisionedClaims = append (provisionedClaims, claim)} glog.V (4). Infof ("Provisioning for claims of pod% q that has no matching volumes on node% Q...", podName Node.Name) / / Mark cache with all the PVCs that need provisioning for this node b.podBindingCache.UpdateProvisionedPVCs (pod, node.Name, provisionedClaims) return true, nil}
CheckVolumeProvisions mainly checks whether the TopologySelectorTerm and Node Labels of the storageClass of the corresponding PVC can match successfully.
If the match is successful, UpdateProvisionedPVCs is called to update the bindingDecisions of the podBindingCache.
AssumePodVolumes
When will the AssumePodVolumes of volumeBinder be called? Let's take a look at the code related to scheduleOne:
ScheduleOne invoke assumeAndBindVolumespkg/scheduler/scheduler.go:439// scheduleOne does the entire scheduling workflow for a single pod. It is serialized on the scheduling algorithm's host fitting.func (sched * Scheduler) scheduleOne () {pod: = sched.config.NextPod ()... SuggestedHost, err: = sched.schedule (pod)... / / Tell the cache to assume that a pod now is running on a given node, even though it hasn't been bound yet. / / This allows us to keep scheduling without waiting on binding to occur. AssumedPod: = pod.DeepCopy () / / Assume volumes first before assuming the pod. / If no volumes need binding, then nil is returned, and continue to assume the pod. / Otherwise, error is returned and volume binding is started asynchronously for all of the pod's volumes. / / scheduleOne () returns immediately on error, so that it doesn't continue to assume the pod. / After the asynchronous volume binding updates are made, it will send the pod back through the scheduler for / / subsequent passes until all volumes are fully bound. / This function modifies' assumedPod' if volume binding is required. Err = sched.assumeAndBindVolumes (assumedPod, suggestedHost) if err! = nil {return} / / assume modifies `assumedPod` by setting NodeName=suggestedHost err = sched.assume (assumedPod, suggestedHost)... / / bind the pod to its host asynchronously (we can do this bbank c of the assumption step above). Go func () {err: = sched.bind (assumedPod, & v1.Binding {...}} ()}
After sched.schedule (pod) completes the predicate,priority of pod, first call sched.assumeAndBindVolumes, then call sched.assume for pod assume, and finally call sched.bind for Bind operation.
AssumeAndBindVolumes add assume pod to BindQueuepkg/scheduler/scheduler.go:268// assumeAndBindVolumes will update the volume cache and then asynchronously bind volumes if required.//// If volume binding is required, then the bind volumes routine will update the pod to send it back through// the scheduler.//// Otherwise, return nil error and continue to assume the pod.//// This function modifies assumed if volume binding is required.func (sched * Scheduler) assumeAndBindVolumes (assumed * v1.Pod) Host string) error {if utilfeature.DefaultFeatureGate.Enabled (features.VolumeScheduling) {allBound, bindingRequired, err: = sched.config.VolumeBinder.Binder.AssumePodVolumes (assumed, host) if err! = nil {sched.config.Error (assumed, err) sched.config.Recorder.Eventf (assumed, v1.EventTypeWarning, "FailedScheduling", "AssumePodVolumes failed:% v" Err) sched.config.PodConditionUpdater.Update (assumed, & v1.PodCondition {Type: v1.PodScheduled, Status: v1.ConditionFalse, Reason: "SchedulerError", Message: err.Error) }) return err} if! allBound {err = fmt.Errorf ("Volume binding started Waiting for completion ") if bindingRequired {if sched.config.Ecache! = nil {invalidPredicates: = sets.NewString (predicates.CheckVolumeBindingPred) sched.config.Ecache.InvalidatePredicates (invalidPredicates) } / / bindVolumesWorker () will update the Pod object to put it back in the scheduler queue sched.config.VolumeBinder.BindQueue.Add (assumed)} else {/ / We are just waiting for PV controller to finish binding Put it back in the / / scheduler queue sched.config.Error (assumed, err) sched.config.Recorder.Eventf (assumed, v1.EventTypeNormal, "FailedScheduling", "v", err) sched.config.PodConditionUpdater.Update (assumed) & v1.PodCondition {Type: v1.PodScheduled, Status: v1.ConditionFalse, Reason: "VolumeBindingWaiting" })} return err}} return nil}
AssumeAndBindVolumes calls volumeBinder.AssumePodVolumes.
Pkg/controller/volume/persistentvolume/scheduler_binder.go:191// AssumePodVolumes will take the cached matching PVs and PVCs to provision// in podBindingCache for the chosen node, and:// 1. Update the pvCache with the new prebound PV.// 2. Update the pvcCache with the new PVCs with annotations set// It will update podBindingCache again with the PVs and PVCs that need an API update.func (b * volumeBinder) AssumePodVolumes (assumedPod * v1.Pod, nodeName string) (allFullyBound, bindingRequired bool Err error) {podName: = getPodName (assumedPod) glog.V (4). Infof ("AssumePodVolumes for pod% Q, node% Q", podName, nodeName) if allBound: = b.arePodVolumesBound (assumedPod) AllBound {glog.V (4) .Infof ("AssumePodVolumes for pod% Q, node% Q: all PVCs bound and nothing to do", podName, nodeName) return true, false, nil} assumedPod.Spec.NodeName = nodeName / / Assume PV claimsToBind: = b.podBindingCache.GetBindings (assumedPod, nodeName) newBindings: = [] * bindingInfo {} for _ Binding: = range claimsToBind {newPV, dirty, err: = b.ctrl.getBindVolumeToClaim (binding.pv, binding.pvc) glog.V (5). Infof ("AssumePodVolumes: getBindVolumeToClaim for pod% Q, PV% Q, PVC% Q.newPV% p, dirty% v, err:% v", podName, binding.pv.Name Binding.pvc.Name, newPV, dirty, err) if err! = nil {b.revertAssumedPVs (newBindings) return false, true Err} if dirty {err = b.pvCache.Assume (newPV) if err! = nil {b.revertAssumedPVs (newBindings) return false, true Err} newBindings = append (newBindings, & bindingInfo {pv: newPV, pvc: binding.pvc})}} / / Don't update cached bindings if no API updates are needed. This can happen if we / / previously updated the PV object and are waiting for the PV controller to finish binding. If len (newBindings)! = 0 {bindingRequired = true b.podBindingCache.UpdateBindings (assumedPod, nodeName, newBindings)} / / Assume PVCs claimsToProvision: = b.podBindingCache.GetProvisionedPVCs (assumedPod, nodeName) newProvisionedPVCs: = [] * v1.PersistentVolumeClaim {} for _, claim: = range claimsToProvision {/ / The claims from method args can be pointing to watcher cache. We must not / / modify these, therefore create a copy. ClaimClone: = claim.DeepCopy () metav1.SetMetaDataAnnotation (& claimClone.ObjectMeta, annSelectedNode NodeName) err = b.pvcCache.Assume (claimClone) if err! = nil {b.revertAssumedPVs (newBindings) b.revertAssumedPVCs (newProvisionedPVCs) return} newProvisionedPVCs = append (newProvisionedPVCs) ClaimClone)} if len (newProvisionedPVCs)! = 0 {bindingRequired = true b.podBindingCache.UpdateProvisionedPVCs (assumedPod, nodeName, newProvisionedPVCs)} return}
VolumeBinder.AssumePodVolumes main logic:
Find the appropriate PVs for the PVCs in the Pod that has not been Bound, and update the PV cache to complete the prebound operation of PVs and PVCs (for PVC that requires Dynamic Provisioning, add Annotation: "pv.kubernetes.io/bound-by-controller").
If the PVCs of Dynamic Provisioning is required, then update the relevant Annotations of these PVCs in PVC cache: "volume.alpha.kubernetes.io/selected-node=$nodeName", which is also equivalent to the prebound operation. The returned values allFullyBound,bindingRequired respectively indicate:
AllFullyBound:bool,true means that all PVCs corresponding to Pod have completed Bound, otherwise false is returned.
BindingRequired:bool,true indicates that there are still API operations for volume binding/provisioning that need to be done, otherwise false is returned.
If allFullyBound is false and bindingRequired is true, add pod to the BindQueue of volumeBinder.
The Pods in BindQueue is processed one by one by bindVolumesWorker, in which volumeBinder.BindPodVolumes is called to complete the volume binding operation. Let's see what bindVolumesWorker has done.
BindVolumesWorker
BindVolumesWorker is responsible for cycling the Pods in the BindQueue in the volumeBinder and completing the volume bind. We need to know where bindVolumesWorker started.
Pkg/scheduler/scheduler.go:174// Run begins watching and scheduling. It waits for cache to be synced, then starts a goroutine and returns immediately.func (sched * Scheduler) Run () {if! sched.config.WaitForCacheSync () {return} if utilfeature.DefaultFeatureGate.Enabled (features.VolumeScheduling) {go sched.config.VolumeBinder.Run (sched.bindVolumesWorker, sched.config.StopEverything)} go wait.Until (sched.scheduleOne, 0, sched.config.StopEverything)}
When default scheduler starts, if VolumeScheduling Feature Gate Enable, bindVolumesWorker goroutine is started.
Pkg/scheduler/scheduler.go:312// bindVolumesWorker () processes pods queued in assumeAndBindVolumes () and tries to// make the API update for volume binding.// This function runs forever until the volume BindQueue is closed.func (sched * Scheduler) bindVolumesWorker () {workFunc: = func () bool {keyObj Quit: = sched.config.VolumeBinder.BindQueue.Get () if quit {return true} defer sched.config.VolumeBinder.BindQueue.Done (keyObj) assumed Ok: = keyObj. (* v1.Pod) if! ok {glog.V (4) .Infof ("Object is not a * v1.Pod") return false} / / TODO: add metrics var reason string var eventType string glog .V (5) .Infof ("Trying to bind volumes for pod\"% v swap% v\ "") Assumed.Namespace, assumed.Name) / / The Pod is always sent back to the scheduler afterwards. Err: = sched.config.VolumeBinder.Binder.BindPodVolumes (assumed) if err! = nil {glog.V (1). Infof ("Failed to bind volumes for pod\"% vAccord% v\ ":% v", assumed.Namespace, assumed.Name Err) reason = "VolumeBindingFailed" eventType = v1.EventTypeWarning} else {glog.V (4). Infof ("Successfully bound volumes for pod\"% v\ ", assumed.Namespace Assumed.Name) reason = "VolumeBindingWaiting" eventType = v1.EventTypeNormal err = fmt.Errorf ("Volume binding started, waiting for completion")} / / Always fail scheduling regardless of binding success. / / The Pod needs to be sent back through the scheduler to: / / * Retry volume binding if it fails. / / * Retry volume binding if dynamic provisioning fails. / / * Bind the Pod to the Node once all volumes are bound. Sched.config.Error (assumed, err) sched.config.Recorder.Eventf (assumed, eventType, "FailedScheduling", "% v", err) sched.config.PodConditionUpdater.Update (assumed, & v1.PodCondition {Type: v1.PodScheduled, Status: v1.ConditionFalse, Reason: reason }) return false} for {if quit: = workFunc () Quit {glog.V (4) .Infof ("bindVolumesWorker shutting down") break}}
BindVolumesWorker invokes volumeBinder.BindPodVolumes to volume binding operation in podBindingCache.
BindPodVolumespkg/controller/volume/persistentvolume/scheduler_binder.go:266// BindPodVolumes gets the cached bindings and PVCs to provision in podBindingCache// and makes the API update for those PVs/PVCs.func (b * volumeBinder) BindPodVolumes (assumedPod * v1.Pod) error {podName: = getPodName (assumedPod) glog.V (4). Infof ("BindPodVolumes for pod% Q", podName) bindings: = b.podBindingCache.GetBindings (assumedPod, assumedPod.Spec.NodeName) claimsToProvision: = b.podBindingCache.GetProvisionedPVCs (assumedPod) AssumedPod.Spec.NodeName) / / Do the actual prebinding. Let the PV controller take care of the rest / / There is no API rollback if the actual binding fails for i, bindingInfo: = range bindings {glog.V (5). Infof ("BindPodVolumes: Pod% Q, binding PV% q to PVC% Q", podName, bindingInfo.pv.Name, bindingInfo.pvc.Name) _, err: = b.ctrl.updateBindVolumeToClaim (bindingInfo.pv, bindingInfo.pvc) False) if err! = nil {/ / only revert assumed cached updates for volumes we haven't successfully bound b.revertAssumedPVs (bindings [I:]) / / Revert all of the assumed cached updates for claims / / since no actual API update will be done b.revertAssumedPVCs (claimsToProvision) return err}} / / Update claims objects to trigger volume provisioning. Let the PV controller take care of the rest / / PV controller is expect to signal back by removing related annotations if actual provisioning fails for i, claim: = range claimsToProvision {if _, err: = b.ctrl.kubeClient.CoreV1 () .PersistentVolumeClaims (claim.Namespace) .Update (claim) Err! = nil {glog.V (4). Infof ("updating PersistentVolumeClaim [% s] failed:% v", getPVCName (claim), err) / / only revert assumed cached updates for claims we haven't successfully updated b.revertAssumedPVCs (return err}} return nil}
According to the information saved in podBindingCache, call API to complete the Binding of PVC and PV, that is, preBound. PV Controller watch goes to this event to perform the actual Volume Bound operation.
According to the information saved in podBindingCache, call API to complete the update of PVCs's claimsToProvision, and PV Controller watch executes Dynamic Volume Provisioning after this event.
Key process
At this point, I believe that you have a deeper understanding of "what is the role of kubernetes Volume", you might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.