A rolling update is a deployment strategy where new versions of an application are gradually rolled out by incrementally replacing old instances with new ones, minimizing downtime and risk. In containerized environments like Kubernetes, this is managed by updating a Deployment manifest, which orchestrates the controlled termination of old pods and the creation of new ones. This process ensures the application remains available to users throughout the update, as traffic is automatically routed to healthy instances by a load balancer.
Glossary
Rolling Update

What is a Rolling Update?
A rolling update is a deployment strategy for updating applications with zero downtime by incrementally replacing old instances with new ones.
The strategy is defined by parameters like maxUnavailable and maxSurge, which control how many pods can be taken down or created concurrently, balancing speed with stability. It is a core technique for progressive delivery and is often contrasted with strategies like blue-green deployment. Key operational concepts like readiness probes and liveness probes are essential for a rolling update to function correctly, ensuring traffic is only sent to fully initialized and healthy instances.
Key Characteristics of Rolling Updates
A rolling update is a deployment strategy that incrementally replaces instances of an application with new versions, ensuring continuous service availability. Its core characteristics define its operational safety and efficiency.
Zero-Downtime Deployment
The primary objective of a rolling update is to achieve zero-downtime deployment. It does this by maintaining a minimum number of healthy instances at all times. The orchestrator (like Kubernetes) follows a controlled sequence:
- Terminates an old pod/instance.
- Schedules and starts a new pod/instance.
- Waits for the new instance to pass its readiness probe.
- Only then proceeds to update the next old instance. This ensures the overall service remains available to users throughout the entire update process.
Controlled Pod Replacement
Rolling updates are governed by configurable parameters that control the pace and risk of the rollout. In Kubernetes, these are defined in a Deployment's strategy spec:
- maxUnavailable: The maximum number of pods that can be unavailable during the update (e.g.,
25%). This controls the degradation in capacity. - maxSurge: The maximum number of pods that can be created over the desired number (e.g.,
25%). This allows extra resources to be provisioned during the transition. These settings allow engineers to balance deployment speed against resource consumption and risk tolerance.
Built-in Rollback Mechanism
A key safety feature is the automatic rollback capability. If the update introduces a critical failure—such as a new pod repeatedly failing its startup or readiness checks—the orchestrator can automatically halt the rollout and revert to the previous stable version. This is often triggered by progress deadlines or health check failures. In Kubernetes, you can manually trigger a rollback using kubectl rollout undo deployment/<name>, which points the Deployment's Pod template back to the previous ReplicaSet.
Health Probe Dependency
The safety of a rolling update is entirely dependent on correctly configured health probes. These are the mechanisms the orchestrator uses to determine instance viability:
- Readiness Probes: Signal when a pod is ready to accept traffic. A rolling update waits for new pods to pass this probe before continuing.
- Liveness Probes: Signal if a pod is running. A failed liveness probe causes the pod to be restarted. Without accurate probes, the system might route traffic to broken pods or terminate healthy ones, leading to service disruption.
Stateless Application Primacy
Rolling updates are ideally suited for stateless applications. Since each instance is independent and disposable, replacing them in sequence does not cause data inconsistency or session corruption. For stateful applications (e.g., databases), a standard rolling update is often insufficient and risky, as it can disrupt quorums or replication. Stateful workloads typically require more sophisticated strategies like partitioned updates or operator-driven updates that understand the stateful semantics.
Comparison to Blue-Green & Canary
Rolling updates differ from other deployment strategies in granularity and traffic control:
- vs. Blue-Green: Blue-green maintains two complete, separate environments and switches all traffic at once. Rolling updates change the same environment incrementally. Blue-green offers instant rollback but requires double the resources.
- vs. Canary Deployment: A canary release is a targeted subset rollout, often to specific users. A rolling update is a full, gradual infrastructure replacement. Canary deployments are a form of progressive delivery focused on validation, while rolling updates focus on safe infrastructure replacement. They are often used in combination.
How a Rolling Update Works
A rolling update is a deployment strategy that incrementally replaces instances of an old application version with new ones, ensuring continuous service availability.
A rolling update is a deployment strategy where new versions of an application are gradually rolled out by incrementally replacing old instances with new ones, minimizing downtime and risk. In a containerized environment like Kubernetes, this is managed by a Deployment controller. It systematically terminates pods running the old version while creating new pods with the updated application, ensuring the overall replica count and service availability are maintained throughout the process.
This strategy is fundamental to continuous deployment and progressive delivery. It allows for a controlled, phased rollout where the system's health can be monitored after each incremental step. If issues are detected, the update can be paused or rolled back, making it a lower-risk alternative to a full, simultaneous replacement. It works in tandem with health checks, readiness probes, and liveness probes to ensure only healthy pods serve traffic.
Rolling Update vs. Other Deployment Strategies
A technical comparison of deployment strategies for managing releases in LLM-powered applications and microservices architectures.
| Feature / Metric | Rolling Update | Blue-Green Deployment | Canary Deployment |
|---|---|---|---|
Primary Goal | Gradual, zero-downtime replacement of instances | Instantaneous, zero-downtime traffic switch | Risk-validated release to a user subset |
Infrastructure Overhead | Low (single, evolving environment) | High (requires two full, identical environments) | Medium (requires routing logic for subset) |
Rollback Speed | Slow (requires reverse rolling update) | Instant (switch traffic back to old environment) | Instant (redirect all traffic away from canary) |
Risk Mitigation | Moderate (failures affect incremental batch) | Low (old environment remains intact) | High (exposure limited to small segment) |
Traffic Control Granularity | Instance-level (pod-by-pod replacement) | Environment-level (all-or-nothing switch) | Fine-grained (percentage or user-based routing) |
Resource Utilization During Update | Efficient (total capacity ~100% of requirement) | Inefficient (200% capacity required during switch) | Efficient (~100% + canary overhead) |
Best For | Stateless services, routine updates | Critical stateful services, major version upgrades | New features, performance validation, high-risk changes |
Kubernetes Native Support |
Frequently Asked Questions
A rolling update is a fundamental deployment strategy for modern, highly available applications. This FAQ addresses common technical questions about its implementation, benefits, and role within the broader landscape of traffic and deployment strategies for LLM-powered systems.
A rolling update is a deployment strategy where new versions of an application are incrementally rolled out by replacing old instances with new ones, minimizing downtime and risk. The process works by an orchestrator (like Kubernetes) updating pods in a controlled sequence. It typically follows these steps:
- The orchestrator starts a new pod with the updated application version.
- Once the new pod passes its readiness probe, it is added to the service's load balancer pool.
- The orchestrator terminates an old pod, draining its connections gracefully.
- This cycle repeats until all pods in the deployment are running the new version. This creates a zero-downtime deployment where the service remains available to users throughout the update process, with both old and new versions handling traffic concurrently during the transition.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Rolling updates are a core component of modern deployment strategies. Understanding these related concepts is essential for managing releases, ensuring high availability, and controlling user traffic in production environments.
Canary Deployment
A risk mitigation strategy where a new application version is released to a small, controlled subset of users or infrastructure before a full rollout. This allows teams to validate stability, performance, and user experience with minimal exposure.
- Key Mechanism: Traffic is split, with a small percentage (e.g., 5%) directed to the new version.
- Monitoring: Critical metrics (error rates, latency) are closely watched. If issues are detected, the rollout is halted and the canary traffic is reverted.
- Example: A social media platform tests a new recommendation algorithm on 1% of its user base to gauge engagement before enabling it for everyone.
Blue-Green Deployment
A strategy that maintains two identical, full-scale production environments: Blue (current version) and Green (new version). Traffic is switched instantaneously from one environment to the other, enabling zero-downtime releases and instant rollbacks.
- Core Benefit: Eliminates the incremental replacement of instances, removing the "rolling" phase and its associated complexity.
- Rollback Process: If the new version (Green) fails, traffic is immediately switched back to the stable version (Blue).
- Infrastructure Cost: Requires double the production infrastructure during the cutover window, which can be managed with cloud automation.
Horizontal Pod Autoscaler (HPA)
A Kubernetes controller that automatically scales the number of pods in a deployment or replica set based on observed CPU utilization, memory consumption, or custom metrics. It is a critical companion to rolling updates for maintaining performance during deployment churn.
- How it works: The HPA continuously monitors pod metrics. If demand increases, it creates new pods to handle load; if demand drops, it scales pods down to save resources.
- Interaction with Rolling Updates: During an update, the HPA ensures the newly created pods from the new version can scale to meet demand as old pods are terminated.
- Custom Metrics: Can scale based on application-specific metrics like requests per second or queue length.
Readiness & Liveness Probes
Kubernetes health checks that determine if a container is operational and ready to serve traffic. They are essential for the safety of rolling updates.
- Readiness Probe: Signals that a container is ready to accept requests. If it fails, the pod is removed from Service endpoints, ensuring traffic is not sent to an unprepared instance during an update.
- Liveness Probe: Determines if a container is running. If it fails, the kubelet kills and restarts the container.
- Update Safety: A proper readiness probe ensures new pods are fully initialized before receiving traffic, and old pods are drained of traffic before termination, preventing request failures.
Progressive Delivery
An overarching software delivery methodology that uses techniques like canary deployments, feature flags, and A/B testing to gradually roll out changes while continuously monitoring for issues. Rolling updates are a foundational technical mechanism within this philosophy.
- Core Principle: Decouple deployment (releasing code to infrastructure) from release (exposing functionality to users).
- Tools Used: Combines infrastructure patterns (rolling/canary) with application-level controls (feature flags) and rigorous observability.
- Goal: To reduce the risk of releases, increase release velocity, and make data-driven decisions about feature impact.
Traffic Splitting
The practice of routing a defined percentage of user requests to different versions of a service. This is the enabling mechanism for canary deployments and A/B testing, often implemented at the load balancer or service mesh layer.
- Implementation: Can be done via configuration in an API Gateway (e.g., 90% to v1, 10% to v2) or a Service Mesh (e.g., Istio VirtualService).
- Precision: Allows for very fine-grained control (e.g., 1%, 5%, 50%) compared to the instance-based granularity of a basic rolling update.
- Use Case: Enables gradual exposure, performance comparison between versions, and targeted feature releases to specific user segments.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us