Inferensys

Glossary

Rolling Update

A deployment strategy where new application versions are gradually rolled out by incrementally replacing old instances with new ones, minimizing downtime and risk.
Risk analyst performing AI risk assessment on laptop, risk matrices visible, casual office risk session.
DEPLOYMENT STRATEGY

What is a Rolling Update?

A rolling update is a deployment strategy for updating applications with zero downtime by incrementally replacing old instances with new ones.

A rolling update is a deployment strategy where new versions of an application are gradually rolled out by incrementally replacing old instances with new ones, minimizing downtime and risk. In containerized environments like Kubernetes, this is managed by updating a Deployment manifest, which orchestrates the controlled termination of old pods and the creation of new ones. This process ensures the application remains available to users throughout the update, as traffic is automatically routed to healthy instances by a load balancer.

The strategy is defined by parameters like maxUnavailable and maxSurge, which control how many pods can be taken down or created concurrently, balancing speed with stability. It is a core technique for progressive delivery and is often contrasted with strategies like blue-green deployment. Key operational concepts like readiness probes and liveness probes are essential for a rolling update to function correctly, ensuring traffic is only sent to fully initialized and healthy instances.

DEPLOYMENT STRATEGY

Key Characteristics of Rolling Updates

A rolling update is a deployment strategy that incrementally replaces instances of an application with new versions, ensuring continuous service availability. Its core characteristics define its operational safety and efficiency.

01

Zero-Downtime Deployment

The primary objective of a rolling update is to achieve zero-downtime deployment. It does this by maintaining a minimum number of healthy instances at all times. The orchestrator (like Kubernetes) follows a controlled sequence:

  • Terminates an old pod/instance.
  • Schedules and starts a new pod/instance.
  • Waits for the new instance to pass its readiness probe.
  • Only then proceeds to update the next old instance. This ensures the overall service remains available to users throughout the entire update process.
02

Controlled Pod Replacement

Rolling updates are governed by configurable parameters that control the pace and risk of the rollout. In Kubernetes, these are defined in a Deployment's strategy spec:

  • maxUnavailable: The maximum number of pods that can be unavailable during the update (e.g., 25%). This controls the degradation in capacity.
  • maxSurge: The maximum number of pods that can be created over the desired number (e.g., 25%). This allows extra resources to be provisioned during the transition. These settings allow engineers to balance deployment speed against resource consumption and risk tolerance.
03

Built-in Rollback Mechanism

A key safety feature is the automatic rollback capability. If the update introduces a critical failure—such as a new pod repeatedly failing its startup or readiness checks—the orchestrator can automatically halt the rollout and revert to the previous stable version. This is often triggered by progress deadlines or health check failures. In Kubernetes, you can manually trigger a rollback using kubectl rollout undo deployment/<name>, which points the Deployment's Pod template back to the previous ReplicaSet.

04

Health Probe Dependency

The safety of a rolling update is entirely dependent on correctly configured health probes. These are the mechanisms the orchestrator uses to determine instance viability:

  • Readiness Probes: Signal when a pod is ready to accept traffic. A rolling update waits for new pods to pass this probe before continuing.
  • Liveness Probes: Signal if a pod is running. A failed liveness probe causes the pod to be restarted. Without accurate probes, the system might route traffic to broken pods or terminate healthy ones, leading to service disruption.
05

Stateless Application Primacy

Rolling updates are ideally suited for stateless applications. Since each instance is independent and disposable, replacing them in sequence does not cause data inconsistency or session corruption. For stateful applications (e.g., databases), a standard rolling update is often insufficient and risky, as it can disrupt quorums or replication. Stateful workloads typically require more sophisticated strategies like partitioned updates or operator-driven updates that understand the stateful semantics.

06

Comparison to Blue-Green & Canary

Rolling updates differ from other deployment strategies in granularity and traffic control:

  • vs. Blue-Green: Blue-green maintains two complete, separate environments and switches all traffic at once. Rolling updates change the same environment incrementally. Blue-green offers instant rollback but requires double the resources.
  • vs. Canary Deployment: A canary release is a targeted subset rollout, often to specific users. A rolling update is a full, gradual infrastructure replacement. Canary deployments are a form of progressive delivery focused on validation, while rolling updates focus on safe infrastructure replacement. They are often used in combination.
DEPLOYMENT STRATEGY

How a Rolling Update Works

A rolling update is a deployment strategy that incrementally replaces instances of an old application version with new ones, ensuring continuous service availability.

A rolling update is a deployment strategy where new versions of an application are gradually rolled out by incrementally replacing old instances with new ones, minimizing downtime and risk. In a containerized environment like Kubernetes, this is managed by a Deployment controller. It systematically terminates pods running the old version while creating new pods with the updated application, ensuring the overall replica count and service availability are maintained throughout the process.

This strategy is fundamental to continuous deployment and progressive delivery. It allows for a controlled, phased rollout where the system's health can be monitored after each incremental step. If issues are detected, the update can be paused or rolled back, making it a lower-risk alternative to a full, simultaneous replacement. It works in tandem with health checks, readiness probes, and liveness probes to ensure only healthy pods serve traffic.

COMPARISON

Rolling Update vs. Other Deployment Strategies

A technical comparison of deployment strategies for managing releases in LLM-powered applications and microservices architectures.

Feature / MetricRolling UpdateBlue-Green DeploymentCanary Deployment

Primary Goal

Gradual, zero-downtime replacement of instances

Instantaneous, zero-downtime traffic switch

Risk-validated release to a user subset

Infrastructure Overhead

Low (single, evolving environment)

High (requires two full, identical environments)

Medium (requires routing logic for subset)

Rollback Speed

Slow (requires reverse rolling update)

Instant (switch traffic back to old environment)

Instant (redirect all traffic away from canary)

Risk Mitigation

Moderate (failures affect incremental batch)

Low (old environment remains intact)

High (exposure limited to small segment)

Traffic Control Granularity

Instance-level (pod-by-pod replacement)

Environment-level (all-or-nothing switch)

Fine-grained (percentage or user-based routing)

Resource Utilization During Update

Efficient (total capacity ~100% of requirement)

Inefficient (200% capacity required during switch)

Efficient (~100% + canary overhead)

Best For

Stateless services, routine updates

Critical stateful services, major version upgrades

New features, performance validation, high-risk changes

Kubernetes Native Support

ROLLING UPDATE

Frequently Asked Questions

A rolling update is a fundamental deployment strategy for modern, highly available applications. This FAQ addresses common technical questions about its implementation, benefits, and role within the broader landscape of traffic and deployment strategies for LLM-powered systems.

A rolling update is a deployment strategy where new versions of an application are incrementally rolled out by replacing old instances with new ones, minimizing downtime and risk. The process works by an orchestrator (like Kubernetes) updating pods in a controlled sequence. It typically follows these steps:

  1. The orchestrator starts a new pod with the updated application version.
  2. Once the new pod passes its readiness probe, it is added to the service's load balancer pool.
  3. The orchestrator terminates an old pod, draining its connections gracefully.
  4. This cycle repeats until all pods in the deployment are running the new version. This creates a zero-downtime deployment where the service remains available to users throughout the update process, with both old and new versions handling traffic concurrently during the transition.
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.