How to apply PDB of Kubernetes 10/21 Update SLTechnology News&Howtos

How to apply PDB of Kubernetes

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to apply PDB of Kubernetes". In daily operation, I believe many people have doubts about how to use PDB of Kubernetes. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to use PDB of Kubernetes". Next, please follow the editor to study!

Application scenarios of PDB

Probably added PodDisruptionBudget Object (hereinafter referred to as PDB) in Kubernetes 1.4, upgraded to Beta at 1.5, but was still Beta until 1.9 Released. But it doesn't matter, let's put all this aside and think about what PDB is trying to solve. PDB Feature has been for more than a year and has not studied it before, mainly without a scene. Recently, we are working on a Kubernetes-based ElasticSearch as a Service (ESaaS) project to try to ensure that there is always at least one healthy ES client pod, ES master pod and ES data pod available in any ElasticSearch Cluster. Many students have learned to think that maxUnavailable can be set in Deployment, isn't that all right? Besides, will there be RS Controller doing copy control?

Wait a minute! When is the maxUnavailable in Deployment used? -- it is used to ensure the minimum number of serviceable copies when rolling updates to applications deployed using Deployment! Where's RS Controller? That's just one of the replica controllers. It doesn't guarantee that there are always several replicas in the cluster. It's responsible for making the actual number of replicas equal to your expected number of replicas as soon as possible, regardless of the actual number of replicas at some point in the middle. At this point, you can consider using Kubernetes PDB, which is used to ensure the high availability of applications and to Budgets those Voluntary (voluntary) Disruption.

Voluntary Disruption was mentioned earlier. Let's get this straight. What is Voluntary Disruption? What is Involuntary Disruption again?

Involuntary Disruption and its Countermeasures

Involuntary Disruption refers to Disruption caused by uncontrollable (or currently difficult to control) external factors, such as:

The hardware failure of the server or the kernel crash caused the node Down.

If the container deployed in VM,VM is deleted by mistake or there is a problem with Hyperwisor.

There is a brain fissure in the cluster. (Kubernetes uses NodeController to handle network brain fissure cases, but evict pods still does not take into account the high availability of applications.) for in-depth analysis of NodeController, please refer to my following blog post:

The executive part of Kubernetes Node Controller source code analysis

The creation of Kubernetes Node Controller source code analysis

Configuration of Kubernetes Node Controller source code analysis

Taint Controller of Kubernetes Node Controller source code analysis

When a node runs out of computing resources due to unreasonable overallocation, it does not take into account the high availability of applications when triggering kubelet eviction. For in-depth analysis of kubelet eviction, please refer to my following blog post:

Kubernetes Eviction Manager source code analysis

Analysis on the working Mechanism of Kubernetes Eviction Manager

PDB is not a solution to Involuntary Disruption, how can we minimize or mitigate the impact of Involuntary Disruption on the high availability of applications when using Kubernetes?

An application is deployed using replica controllers such as Deployment,RS,StatefulSet as far as possible, and the replicas is greater than 1.

Set the request value of the application container so that even when resources are very tight, there are enough resources for it to use.

In addition, try to consider the HA on physical devices, such as different copies of an application to be deployed across servers, across cabinets and across racks, across switches, and so on.

PDB is designed to ensure the high availability of applications when Voluntary Disruption

The opposite scenario of Involuntary Disruption, of course, is Voluntary Disruption, which refers to the Kubernetes-controllable Disruption scenario triggered by users or cluster administrators, such as:

Remove controllers that manage Pods, such as Deployment,RS,RC,StatefulSet.

Triggers a rolling update of the application.

Delete Pods in batch directly.

Kubectl drain one node (node offline, cluster capacity reduced)

PDB is designed for Voluntary Disruption scenarios and belongs to one of the categories that Kubernetes can control, not for Involuntary Disruption.

After the Kube-Node project is launched, you can support automatic management of Node by interfacing with cloud provider such as Openstack,AWS,GCE, so there may be frequent HNA (Horizontal Node Autoscaleer) events, and the workflow has drain a node-like logic, so you need to use PDB to ensure the HA of the application.

The usage and points for attention of PDB

Each App deployed in Kubernetes can create a corresponding PDB Object, which is used to limit the maximum number of replicas that can be down or at least keep the number of replicas of Available during Voluntary Disruptions, so as to ensure the high availability of applications.

PDB can be used to protect applications managed by Kubernetes built-in controllers, in which case DPB selector is required to be equivalent to the Selector of these Controller Object:

Deployment

ReplicationController

ReplicaSet

StatefulSet

It can also be used to protect Pods Set that is only chosen by PDB Selector itself, but there are two usage restrictions:

Only .spec.minAvailable can be configured, not maxUnavailable

.spec.minAvailable can only be an integer value, not a percentage.

Therefore, in any case, the Pods Set affected by PDB is selected through its own Selector, and you should be careful not to use overlapping Selectors for different PDB Object under the same namespace.

When using PDB, you need to figure out the type of application you have and what you want to do about it:

Stateless applications: for example, you want to have at least 60% of the replica Available.

Solution: create a PDB Object and specify 60% minAvailable or 40% maxUnavailable.

Stateful application of a single instance: prior notice and consent must be obtained before terminating this instance.

Solution: create a PDB Object and set maxUnavailable to 0, so that Kubernetes will prevent the instance from being deleted, then notify and ask the user for consent, then delete the PDB to remove the block, and then go to recreate. The rolling update of a single instance statefulset must have a service stop time, so it is recommended that you do not create a single instance StatefulSet in the production environment.

Stateful application of multiple instances: the minimum number of available instances cannot be less than a certain number N (for example, the election mechanism limited by the application of raft protocol class)

Solution: set maxUnavailable=1 or minAvailable=N to allow only one instance at a time and expected_replicas-minAvailable instances at a time.

Batch Job:Job requires that eventually a Pod completes the task successfully.

Job Controller has its own mechanism to guarantee this, and there is no need to create a PDB.

For in-depth interpretation of Job Controller, please refer to my blog post: Kubernetes Job Controller source code analysis

Define PDB Object

After thinking about this, I've decided to create a PDB, and then I'll take a look at how PodDisruptionBudget is defined. Here's a Sample:

ApiVersion: policy/v1beta1kind: PodDisruptionBudgetmetadata: name: zk-pdbspec: minAvailable: 2 selector: matchLabels: app: zookeeper

In fact, the definition of PDB consists of three key elements:

.spec.selector is used to select the backend Pods Set, and the best practice is to be consistent with the Selector of the corresponding Deployment,StatefulSet of the application.

.spec.minAvailable indicates that at least the number or proportion of Pods available during the occurrence of voluntary disruptions

.spec.maxUnavailable means that in order to ensure the maximum number or proportion of unavailable Pods during the occurrence of voluntary disruptions, Kubernetes version > = 1.7. This configuration can only be used to correspond to the Pods of Deployment,RS,RC,StatefulSet. It is recommended to use .spec.maxUnavailable first.

Note:

.spec.minAvailable and .spec.maxUnavailable cannot be defined in the same PDB Object.

As mentioned earlier, although Pod's delete and unavailable are also voluntary disruption when rolling updates are applied, rolling updates actually have their own policy controls (marSurge and maxUnavailable), so PDB does not interfere with this process.

PDB can only guarantee the number of replicas in voluntary disruptions, such as .spec.minAvailable or .spec.maxUnavailable in the evict pod process, when a normal Pod suddenly dies because of Node Down (Involuntary Disruption), then the actual number of Pods is less than that required in PDB, so PDB is not omnipotent!

In use, if you set .spec.minAvailable to 100% or .spec.maxUnavailable to 0%, it means that the evict pods process will be completely blocked (except for rolling updates for Deployment and StatefulSet).

Create PDB Object

Kubectl apply-f zk-pdb.yaml creates the PDB Object

$kubectl get poddisruptionbudgetsNAME MIN-AVAILABLE ALLOWED-DISRUPTIONS AGEzk-pdb 2 1 7s

Kubect get pdb zk-pdb-o yaml View:

$kubectl get poddisruptionbudgets zk-pdb-o yamlapiVersion: policy/v1beta1kind: PodDisruptionBudgetmetadata: creationTimestamp: 2017-08-28T02:38:26Z generation: 1 name: zk-pdb...status: currentHealthy: 3 desiredHealthy: 3 disruptedPods: null disruptionsAllowed: 1 expectedPods: 3 observedGeneration: 1PDB working principle and source code analysis

PDB Object is defined as the expected state of the user when encountering voluntary disruption, and the real maintenance of this expected state is also a Controller managed by kube-controller-manager, that is, Disruption Controller.

Disruption Controller is mainly watch Pods and PDBs. After listening to the Add/Del/Update event of pod/pdb, the corresponding pdb object will be put into rate limit queue for worker processing. The main logic of worker is to calculate the currentHealthy, desiredHealthy, expectedCount and disruptedPods of PodDisruptionBudgetStatus, and then call api to update PDB Status.

Pkg/controller/disruption/disruption.go:498func (dc * DisruptionController) trySync (pdb * policy.PodDisruptionBudget) error {pods, err: = dc.getPodsForPdb (pdb) if err! = nil {dc.recorder.Eventf (pdb, v1.EventTypeWarning, "NoPods", "Failed to get pods:% v" Err) return err} if len (pods) = 0 {dc.recorder.Eventf (pdb, v1.EventTypeNormal, "NoPods", "No matching pods found")} expectedCount, desiredHealthy, err: = dc.getExpectedPodCount (pdb, pods) if err! = nil {dc.recorder.Eventf (pdb, v1.EventTypeWarning, "CalculateExpectedPodCountFailed") "Failed to calculate the number of expected pods:% v" err) return err} currentTime: = time.Now () disruptedPods, recheckTime: = dc.buildDisruptedPodMap (pods, pdb, currentTime) currentHealthy: = countHealthyPods (pods, disruptedPods, currentTime) err = dc.updatePdbStatus (pdb, currentHealthy, desiredHealthy, expectedCount DisruptedPods) if err = = nil & & recheckTime! = nil {/ / There is always at most one PDB waiting with a particular name in the queue, / / and each PDB in the queue is associated with the lowest timestamp / / that was supplied when a PDB with that name was added. Dc.enqueuePdbForRecheck (pdb, recheckTime.Sub (currentTime))} return err}

Here is the definition of PodDisruptionBudgetStatus:

Pkg/apis/policy/types.go:48type PodDisruptionBudgetStatus struct {/ / Most recent generation observed when updating this PDB status. PodDisruptionsAllowed and other / / status informatio is valid only if observedGeneration equals to PDB's object generation. / / + optional ObservedGeneration int64 `json: "observedGeneration,omitempty" protobuf: "varint,1,opt,name=observedGeneration" `/ / DisruptedPods contains information about pods whose eviction was / / processed by the API server eviction subresource handler but has not / / yet been observed by the PodDisruptionBudget controller. / / A pod will be in this map from the time when the API server processed the / / eviction request to the time when the pod is seen by PDB controller / / as having been marked for deletion (or after a timeout) The key in the map is the name of the pod / / and the value is the time when the API server processed the eviction request. If / / the deletion didn't occur and a pod is still there it will be removed from / / the list automatically by PodDisruptionBudget controller after some time. / / If everything goes smooth this map should be empty for the most of the time. / / Large number of entries in the map may indicate problems with pod deletions. DisruptedPods map [string] metav1.Time `json: "disruptedPods" protobuf: "bytes,2,rep,name=disruptedPods" `/ / Number of pod disruptions that are currently allowed. PodDisruptionsAllowed int32 `json: "disruptionsAllowed" protobuf: "varint,3,opt,name=disruptionsAllowed" `/ / current number of healthy pods CurrentHealthy int32 `json: "currentHealthy" protobuf: "varint,4,opt,name=currentHealthy"` / / minimum desired number of healthy pods DesiredHealthy int32 `json: "desiredHealthy" protobuf: "varint,5,opt,name=desiredHealthy" `/ / total number of pods counted by this disruption budget ExpectedPods int32 `json: "expectedPods" protobuf: "varint,6,opt,name=expectedPods"`}

The most important elements of PodDisruptionBudgetStatus are * * DisruptedPods and PodDisruptionsAllowed**:

DisruptedPods: used to store pods that has been processed by apiserver pod eviction subresource but has not been discovered and processed by PDB Controller, is of type Map, and key of Pod Name,value is the time that apiserver accepts eviction subresource requests. The Pod added has a 2min timeout. If the Pod has not been deleted after the 2min, the Pod will be removed from the queue.

PodDisruptionsAllowed: indicates the number of Pods currently allowed for Disruption.

The main logic of Disruption Controller is to update PDB.Status, so the question is, who controls the maxUnavailable or minAvailable of eviction when voluntary distribution?

Again, PDB Controller handles only those pods that are requested through pod eviction subresource, so the above problem needs to be found in the evictionRest of the corresponding Pod.

Pkg/registry/core/pod/storage/eviction.go:81// Create attempts to create a new eviction. That is, it tries to evict a pod.func (r * EvictionREST) Create (ctx genericapirequest.Context, obj runtime.Object, createValidation rest.ValidateObjectFunc, includeUninitialized bool) (runtime.Object, error) {eviction: = obj. (* policy.Eviction) obj, err: = r.store.Get (ctx, eviction.Name, & metav1.GetOptions {}) if err! = nil {return nil Err} pod: = obj. (* api.Pod) var rtStatus * metav1.Status var pdbName string err = retry.RetryOnConflict (EvictionsRetry, func () error {pdbs, err: = r.getPodDisruptionBudgets (ctx Pod) if err! = nil {return err} if len (pdbs) > 1 {rtStatus = & metav1.Status {Status: metav1.StatusFailure, Message: "This pod has more than one PodDisruptionBudget Which the eviction subresource does not support. ", Code: } return nil} else if len (pdbs) = = 1 {pdb: = pdbs [0] pdbName = pdb.Name / / Try to verify-and-decrement / / If it was false already Or if it becomes false during the course of our retries, / / raise an error marked as a 429. If err: = r.checkAndDecrement (pod.Namespace, pod.Name, pdb) Err! = nil {return err}} return nil}) if err = = wait.ErrWaitTimeout {err = errors.NewTimeoutError (fmt.Sprintf ("couldn't update PodDisruptionBudget% q due to conflicts", pdbName) 10)} if err! = nil {return nil, err} if rtStatus! = nil {return rtStatus, nil} / / At this point there was either no PDB or we succeded in decrementing / / Try the delete _, _, err = r.store.Delete (ctx, eviction.Name Eviction.DeleteOptions) if err! = nil {return nil, err} / / Success! Return & metav1.Status {Status: metav1.StatusSuccess}, nil}

When you request an evict pod through EvictionREST, you will check that the pod has only one corresponding pdb, otherwise an error will be reported. For the use of Eviction API, please refer to The Eviction API. Here is a simple Sample:

{"apiVersion": "policy/v1beta1", "kind": "Eviction", "metadata": {"name": "quux", "namespace": "default"}} $curl-v-H 'Content-type: application/json' http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction-d @ eviction.json

Then use checkAndDecrement to check whether the manUnavailable or minAvailable of PDB is satisfied, and subtract 1 from pdb.Status.PodDisruptionsAllowed if so.

If checkAndDecrement is successful, you will really go to delete the corresponding Pod.

/ / checkAndDecrement checks if the provided PodDisruptionBudget allows any disruption.func (r * EvictionREST) checkAndDecrement (namespace string, podName string, pdb policy.PodDisruptionBudget) error {if pdb.Status.ObservedGeneration

< pdb.Generation { // TODO(mml): Add a Retry-After header. Once there are time-based // budgets, we can sometimes compute a sensible suggested value. But // even without that, we can give a suggestion (10 minutes?) that // prevents well-behaved clients from hammering us. err := errors.NewTooManyRequests("Cannot evict pod as it would violate the pod's disruption budget.", 0) err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: "DisruptionBudget", Message: fmt.Sprintf("The disruption budget %s is still being processed by the server.", pdb.Name)}) return err } if pdb.Status.PodDisruptionsAllowed < 0 { return errors.NewForbidden(policy.Resource("poddisruptionbudget"), pdb.Name, fmt.Errorf("pdb disruptions allowed is negative")) } if len(pdb.Status.DisruptedPods) >

MaxDisruptedPodSize {return errors.NewForbidden (policy.Resource ("poddisruptionbudget"), pdb.Name, fmt.Errorf ("DisruptedPods map too big-too many evictions not confirmed by PDB controller")} if pdb.Status.PodDisruptionsAllowed = = 0 {err: = errors.NewTooManyRequests ("Cannot evict pod as it would violate the pod's disruption budget.", 0) err.ErrStatus.Details.Causes = append (err.ErrStatus.Details.Causes Metav1.StatusCause {Type: "DisruptionBudget", Message: fmt.Sprintf ("The disruption budget% s needs% d healthy pods and has% d currently", pdb.Name, pdb.Status.DesiredHealthy Pdb.Status.CurrentHealthy)}) return err} pdb.Status.PodDisruptionsAllowed-- if pdb.Status.DisruptedPods = = nil {pdb.Status.DisruptedPods = make (map [string] metav1.Time)} / Eviction handler needs to inform the PDB controller that it is about to delete a pod / / so it should not consider it as available in calculations when updating PodDisruptions allowed. / / If the pod is not deleted within a reasonable time limit PDB controller will assume that it won't / / be deleted at all and remove it from DisruptedPod map. Pdb.Status.DisruptedPods [podName] = metav1.Time {Time: time.Now ()} if _, err: = r.podDisruptionBudgetClient.PodDisruptionBudgets (namespace). UpdateStatus (& pdb); err! = nil {return err} return nil}

CheckAndDecrement mainly checks whether pdb.Status.PodDisruptionsAllowed is greater than 0, and that DisruptedPods cannot contain more than 2000 Pods (Disruption Controller performance may not be enough to support this).

If the check passes, subtract 1 from pdb.Status.PodDisruptionsAllowed, and then add the Pod to the Map of DisruptedPods, and the value of map is the current time (the time when apiserver accepts the eviction request).

Update PDB,PDB Controller because it listens on PDB's Update Event, and then triggers the logic of PDB Controller to maintain PDB Status again.

Note:PDB is also used in scheduler. When preemptive scheduling based on Pod Priority, generic_scheduler will validate all Pod on Node when preempte pod, count the number of Pods violating PDB, and try to choose node with less PDB Pods violation when Select Node.

At this point, the study on "how to use PDB of Kubernetes" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.