Inferensys

Glossary

Pod Disruption Budget (PDB)

A Pod Disruption Budget (PDB) is a Kubernetes API object that limits the number of concurrent voluntary disruptions to pods in an application, ensuring high availability during cluster maintenance operations like node drains or updates.
Operations room with a large monitor wall for system visibility and control.
KUBERNETES POLICY

What is Pod Disruption Budget (PDB)?

A Pod Disruption Budget (PDB) is a Kubernetes API object that specifies the minimum number or percentage of pods for an application that must remain available during voluntary disruptions, ensuring high availability for stateful and critical workloads.

A Pod Disruption Budget (PDB) is a declarative Kubernetes policy that constrains the number of concurrent voluntary disruptions to pods belonging to a single application. It defines thresholds using minAvailable or maxUnavailable fields, which the cluster respects during operations like node draining for maintenance, cluster autoscaling downscaling, or manual pod eviction. This mechanism is distinct from handling involuntary disruptions caused by hardware failure.

By enforcing availability guarantees, PDBs are critical for agent deployment observability and maintaining Service Level Objectives (SLOs) for stateful services like databases or agent backends. When a disruptive operation is requested, the Kubernetes API server checks all relevant PDBs; if the operation would violate a budget, it is temporarily blocked. This allows SREs and DevOps Engineers to schedule maintenance safely, preventing cascading failures and ensuring deterministic application uptime.

KUBERNETES AVAILABILITY

Key Features of Pod Disruption Budgets

A Pod Disruption Budget (PDB) is a Kubernetes API object that specifies the minimum number or percentage of pods that must remain available during voluntary disruptions, acting as a safeguard for application availability.

01

Voluntary vs. Involuntary Disruptions

A PDB only governs voluntary disruptions, which are initiated by cluster administrators or automated processes. These include:

  • Draining a node for maintenance or upgrade.
  • Deleting a pod managed by a Deployment.
  • Updating a pod template that triggers a rolling update.

Involuntary disruptions, like a node hardware failure or a pod eviction due to resource exhaustion, are not blocked by a PDB. The budget's role is to ensure planned maintenance does not violate your application's availability requirements.

02

minAvailable and maxUnavailable

You define a PDB using one of two mutually exclusive fields:

  • minAvailable: Specifies the absolute number or percentage of pods that must always be ready. For a 10-pod deployment, minAvailable: 80% (or 8) means the disruption process must never leave fewer than 8 pods running.
  • maxUnavailable: Specifies the absolute number or percentage of pods that can be unavailable. For the same deployment, maxUnavailable: 20% (or 2) means the disruption process can take down up to 2 pods at a time.

These parameters create a budget of allowed concurrent disruptions, pacing the eviction process.

03

Selector-Based Scope

A PDB does not reference a Deployment or StatefulSet directly. Instead, it uses a label selector (spec.selector) to identify the pods it protects. This creates a loose coupling.

Example: A PDB with selector.matchLabels: app=api-server will apply to all pods with that label, whether they are managed by a Deployment, ReplicaSet, or StatefulSet. This is powerful for protecting logical application groups but requires careful label management to ensure the PDB matches the intended pod set.

04

Interaction with Disruption Processes

When a cluster administrator runs kubectl drain <node>, the process interacts with PDBs:

  1. The drain command identifies all pods on the node.
  2. For each pod, it checks if eviction would violate any matching PDB.
  3. Pods whose eviction is allowed are terminated. Pods whose eviction would violate the PDB are skipped, leaving the node in a draining but not fully cleared state.

The Kubernetes Eviction API respects PDBs, making this behavior consistent across all voluntary disruption tools.

05

Health & Readiness Probe Dependency

A PDB's calculations are based on pods in the Ready state (as determined by their readiness probe). A pod that is running but failing its readiness probe is considered unavailable for the budget's purposes.

Critical Implication: If many pods are in a crash-loop or failing probes, they may already be counted as "unavailable." A voluntary disruption could then be blocked because the minAvailable threshold is already breached, even though no new pods are being deleted. Effective PDBs require stable pod health.

KUBERNETES AVAILABILITY MECHANISMS

PDB vs. Other Availability Controls

A comparison of the Pod Disruption Budget (PDB) with other Kubernetes controllers and patterns that influence application availability, highlighting their distinct purposes and operational scopes.

Feature / MechanismPod Disruption Budget (PDB)Horizontal Pod Autoscaler (HPA)Liveness/Readiness ProbesService Mesh (e.g., Istio)

Primary Purpose

Limit voluntary disruptions during maintenance

Scale pod count based on resource demand

Determine container health & readiness for traffic

Manage & secure service-to-service communication

Triggering Event

User-initiated eviction (e.g., node drain, kubectl drain)

Metric threshold (e.g., CPU > 70%)

Container process state or HTTP response

Network traffic patterns & policy configuration

Operational Scope

Pod availability within an application (Deployment/StatefulSet)

Resource utilization of a Deployment/ReplicaSet

Individual container health within a pod

Network layer between services across the cluster

Key Configuration Parameter

minAvailable or maxUnavailable (count or percentage)

Target metric value and min/max replica bounds

Probe type (HTTP, TCP, Exec), delay, timeout, period

Traffic routing rules, retry policies, fault injection

Automatic Remediation

Protects Against

Voluntary disruptions from cluster operations

Resource starvation due to increased load

Application hangs or deadlocks

Network failures, latency spikes, partial outages

Granularity of Control

Application-level (selector-based)

Deployment-level

Container-level

Service-level & network path-level

Typical Use Case

Ensuring >= 2 pods of a payment service remain during a node upgrade

Adding replicas when request latency increases

Restarting a pod if its health endpoint fails 3 times

Shifting 10% of traffic to a canary version and retrying failed requests

POD DISRUPTION BUDGET

Frequently Asked Questions

A Pod Disruption Budget (PDB) is a critical Kubernetes policy for ensuring application availability during voluntary cluster maintenance. These questions address its core mechanics, use cases, and operational best practices.

A Pod Disruption Budget (PDB) is a Kubernetes API object that specifies the minimum number or percentage of pods for a given application that must remain available during voluntary disruptions. It works by acting as a constraint on the cluster's disruption controllers (like the node drain process), preventing them from evicting pods if doing so would violate the defined availability threshold.

When a user initiates a voluntary action—such as draining a node for maintenance, updating a node's kernel, or scaling down a node pool—the Kubernetes control plane checks all relevant PDBs. If evicting pods from the target node would reduce the available replicas below the minAvailable count or above the maxUnavailable percentage, the eviction API will temporarily block the request. The drain command will pause until pods can be safely rescheduled elsewhere without breaching the PDB, ensuring high availability is maintained.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.