Glossary

Liveness Probe

A liveness probe is a Kubernetes mechanism that periodically checks if a container is running. If the probe fails, the kubelet restarts the container to maintain application health.

Get in touch Learn more

Developer building retrieval augmentation on laptop, document chunks and embeddings visualized, technical workspace.

KUBERNETES DEPLOYMENT

What is a Liveness Probe?

A liveness probe is a Kubernetes health check that determines if a container is running. If it fails, the container is automatically restarted.

A liveness probe is a Kubernetes mechanism where the kubelet periodically executes a diagnostic check—such as an HTTP request, TCP socket connection, or command execution—against a running container. Its sole purpose is to determine if the application process inside the container is alive but potentially stuck in a deadlock or unresponsive state. If the probe fails repeatedly, Kubernetes assumes the application is unhealthy and automatically terminates the container, triggering a restart according to the pod's restartPolicy. This is a core self-healing capability for maintaining application availability.

Liveness probes are distinct from readiness probes, which determine if a container is ready to accept traffic. A failed liveness probe triggers a restart, while a failed readiness probe only removes the pod from service endpoints. Correctly configuring liveness probes is critical; overly sensitive checks can cause unnecessary restart loops, while overly lenient ones may leave failed containers running. They are a foundational component of traffic and deployment strategies, ensuring that user requests are only routed to truly healthy application instances.

KUBERNETES MECHANISM

Key Features of Liveness Probes

Liveness probes are a core Kubernetes health check mechanism used to determine if a container is running. If a probe fails, the kubelet restarts the container to attempt recovery.

Core Purpose: Container Health Monitoring

A liveness probe continuously monitors the operational state of a container within a pod. Its primary function is to detect when an application has entered a deadlocked or unresponsive state—where it is running but cannot make progress. Unlike a readiness probe, which governs network traffic, a failed liveness probe triggers a container restart by the kubelet. This automatic recovery mechanism is essential for maintaining application availability without manual intervention.

Key Distinction: Liveness = "Is the app working?" (restart if not). Readiness = "Is the app ready for traffic?" (stop sending traffic if not).
Use Case: A web server that is running but returning HTTP 500 errors for all requests should be restarted.

Probe Types & Configuration

Kubernetes supports three primary mechanisms for executing a liveness check, each defined in the container's specification within the pod manifest.

HTTP GET Probe: The kubelet sends an HTTP GET request to a specified path and port on the container. A response code between 200 and 399 indicates success. This is the most common probe for web services.
TCP Socket Probe: The kubelet attempts to open a TCP connection to a specified port on the container. Success is defined by establishing a connection. Ideal for non-HTTP services like databases.
Exec Probe: The kubelet executes a specified command inside the container. A zero exit code indicates success. Used for custom, application-specific health logic.

Probes are configured with parameters like initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold to fine-tune sensitivity.

Interaction with Pod Lifecycle

The liveness probe operates within the broader pod lifecycle. When a probe fails consistently (meeting the failureThreshold), the kubelet kills the container, triggering the pod's restartPolicy. For pods with restartPolicy: Always (the default for Deployments), the container is restarted on the same node. The pod's status changes briefly to CrashLoopBackOff if restarts fail repeatedly, with exponentially increasing delay between restart attempts.

Critical Consideration: A poorly configured probe (e.g., with an insufficient initialDelaySeconds) can cause a restart loop, where the container is killed repeatedly as it starts up, preventing the application from ever becoming ready.

Strategic Design & Anti-Patterns

Effective liveness probe design is crucial for system stability. The probe should check a lightweight, internal endpoint that tests the core functionality without dependencies on external services like databases or downstream APIs. A failure should indicate that the container itself is broken and requires a restart.

Common Anti-Patterns to Avoid:

Leaky Abstraction: Making the liveness probe dependent on external services. If a database fails, all application containers restart, amplifying the outage.
Overly Sensitive Checks: Using a check that fails for transient, self-correcting issues, causing unnecessary restarts and churn.
Resource-Intensive Checks: Using an exec probe that consumes significant CPU, affecting application performance.

The best practice is to implement a dedicated, simple /healthz or /livez endpoint for the liveness probe.

Differentiation from Readiness Probes

While both are health checks, liveness and readiness probes serve distinct purposes and trigger different Kubernetes actions. Understanding this distinction is fundamental for reliable deployments.

Aspect	Liveness Probe	Readiness Probe
Goal	Determine if container needs restart.	Determine if container can receive traffic.
Action on Failure	Kubelet restarts the container.	Kubelet removes the pod's IP from all Service endpoints.
Typical Use	Recover from deadlocks, unresponsive apps.	Handle slow startup, temporary unavailability (e.g., loading cache).
Impact	Affects container lifecycle on its node.	Affects network traffic routing via the Service.

A pod often uses both: a readiness probe to manage traffic during large startup routines, and a liveness probe to catch runtime hangs.

Related Observability & Deployment Concepts

Liveness probes are one component of a holistic strategy for application resilience and progressive delivery. They work in concert with other Kubernetes features and deployment patterns.

Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on metrics. Unhealthy pods (failing liveness) are replaced, affecting scaling behavior.
Rolling Updates: During a deployment update, new pods must pass their readiness probes to be added to the Service. Liveness probes ensure the new versions remain healthy after the cutover.
Pod Disruption Budgets (PDBs): Limit voluntary disruptions. A pod killed by a failing liveness probe is an involuntary disruption and does not count against the PDB.
Service Mesh Health Checks: A service mesh (e.g., Istio) can implement its own application-layer health checks, but the kubelet's liveness probe remains the primary mechanism for container lifecycle management.

PROTOCOL COMPARISON

Liveness Probe Types: HTTP, TCP, and Exec

A comparison of the three primary mechanisms Kubernetes provides to check if a container is alive, detailing their operation, configuration, and typical use cases.

Feature / Characteristic	HTTP GET Probe	TCP Socket Probe	Exec Probe
Core Mechanism	Issues an HTTP GET request to a specified path and port.	Attempts to open a TCP connection to a specified container port.	Executes a specified command inside the container.
Success Criteria	HTTP status code between 200 and 399 inclusive.	TCP connection is successfully established.	Command exits with status code 0.
Primary Use Case	Web servers, REST APIs, HTTP-based services.	Non-HTTP services (databases, memcached, custom TCP servers).	Complex, custom health logic not covered by network checks.
Configuration Complexity	Low. Requires path, port, and optional HTTP headers.	Low. Requires only port number.	High. Requires crafting a precise shell command.
Resource Overhead	Low (network call).	Very Low (simple socket open).	High (spawns a new process per check).
Security Consideration	Exposes a health endpoint. Should be internal/unprivileged.	Exposes a port. Should be firewalled from public access.	Runs arbitrary commands. High risk if not carefully audited.
Recommended For	Standard web applications.	Stateful services and legacy protocols.	Last resort for unique initialization sequences.
Failure Action	Kubelet kills and restarts the container.	Kubelet kills and restarts the container.	Kubelet kills and restarts the container.

LIVENESS PROBE

Critical Configuration Parameters

A Liveness Probe is a Kubernetes health check that determines if a container is running. If it fails, the kubelet restarts the container. Configuring it correctly is critical for application resilience.

Probe Type: HTTP GET

The most common probe type. Kubernetes sends an HTTP GET request to a specified path and port on the container. A successful response (HTTP status code between 200 and 399) indicates the container is alive.

Configuration Example: path: /healthz, port: 8080
Use Case: Ideal for web servers and REST APIs where a dedicated health endpoint can be implemented.
Failure Consequence: A non-2xx/3xx status code or timeout triggers a container restart.

Probe Type: TCP Socket

Kubernetes attempts to open a TCP connection to a specified port on the container. Success is based solely on whether a connection can be established.

Configuration Example: port: 9300
Use Case: Essential for non-HTTP services like databases (e.g., PostgreSQL, Redis), gRPC servers, or custom TCP-based protocols.
Key Consideration: Less granular than HTTP; confirms the process is listening but not necessarily functional.

Probe Type: Exec Command

Kubernetes executes a specified command inside the container. A zero exit code indicates liveness.

Configuration Example: command: ["pg_isready", "-U", "postgres"]
Use Case: For complex health checks that require custom logic, such as checking internal application state or verifying a dependent process.
Caution: Adds overhead. The command must be lightweight and available within the container image.

Timing Parameters

Fine-tuning these parameters prevents unnecessary restarts and allows applications adequate time to start.

initialDelaySeconds: Wait time after container starts before initiating probes. Critical to avoid killing slow-starting apps.
periodSeconds: How often to perform the probe (e.g., every 10 seconds).
timeoutSeconds: Time after which a probe is considered failed if no response.
successThreshold: Consecutive successes required to mark a failed container as alive (defaults to 1 for liveness).
failureThreshold: Number of consecutive probe failures before restarting the container.

Liveness vs. Readiness Probe

These are distinct but complementary mechanisms.

Liveness Probe: Answers "Is the container running?" Failure results in a container restart.
Readiness Probe: Answers "Is the container ready to serve traffic?" Failure removes the pod from Service endpoints but does not restart it.

Key Difference: A pod can be alive (process running) but not ready (still loading data, warming up caches). Misconfiguring liveness to check readiness can cause restart loops.

Anti-Patterns & Best Practices

Poor configuration can cause instability.

Anti-Patterns to Avoid:

Using the same check for liveness and readiness.
Setting a liveness check on an endpoint with external dependencies (e.g., database). A downstream failure shouldn't crash your app.
Too-short initialDelaySeconds or timeoutSeconds.

Best Practices:

Liveness checks should be low-cost, self-contained, and idempotent.
Always implement a dedicated, internal health endpoint for HTTP checks.
Use readiness gates for complex startup dependencies before enabling liveness probes.

KUBERNETES

Frequently Asked Questions

A liveness probe is a core Kubernetes health-check mechanism. These questions address its purpose, configuration, and role in ensuring application resilience.

A liveness probe is a Kubernetes health-check mechanism that determines if a container is running. If the probe fails, the kubelet terminates the container and restarts it according to the pod's restartPolicy. Its primary function is to recover applications that have entered a broken state, such as a deadlock, where the process is running but unable to make progress.

Probes are defined in the container specification of a pod's manifest and can be one of three types:

HTTP GET probe: Succeeds if a request to a specified endpoint returns a success status code (2xx or 3xx).
TCP Socket probe: Succeeds if a TCP connection can be established on a specified port.
Exec probe: Succeeds if a command executed inside the container exits with status code 0.

Unlike a readiness probe, which controls traffic flow, a liveness probe governs container lifecycle, acting as a last-resort recovery mechanism for unrecoverable internal errors.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

KUBERNETES & DEPLOYMENT

Related Terms

A liveness probe is a core component of a robust health monitoring system. Understanding these related concepts is essential for designing resilient, self-healing applications in Kubernetes and modern cloud-native architectures.

Readiness Probe

A Kubernetes mechanism that determines if a container is ready to accept network traffic. Unlike a liveness probe, which determines if a container is running, a readiness probe checks if the application is fully initialized and stable. If it fails, the pod is removed from Service endpoints, stopping traffic flow but not restarting the container. This allows slow-starting applications (e.g., those loading large caches) to become ready without being killed.

Key Difference: Liveness = "Is it alive?" (restarts container). Readiness = "Is it ready to work?" (controls traffic).
Use Case: A pod running a database that needs 30 seconds to load indices should have a readiness probe to prevent queries before it's fully operational.

Health Check

A generic term for any periodic test performed by an orchestrator or load balancer to verify an application instance's operational status. In Kubernetes, liveness and readiness probes are specific types of health checks. More broadly, health checks can be implemented at the load balancer (e.g., AWS Target Group) or service mesh level to route traffic only to healthy endpoints.

Components: Typically involves an HTTP endpoint (/health), TCP socket check, or command execution.
Outcome: Unhealthy instances are drained of traffic or terminated and replaced.

Horizontal Pod Autoscaler (HPA)

A Kubernetes controller that automatically scales the number of pods in a deployment based on observed CPU utilization, memory consumption, or custom metrics. HPA works in concert with probes: if a liveness probe fails and a pod is restarted, HPA ensures the desired replica count is maintained. Conversely, if pods are unhealthy (readiness probe failing), they are excluded from metric calculations, preventing HPA from scaling based on faulty data.

Interaction: Unready pods do not count toward HPA's "ready pod" calculations.
Goal: Maintain application performance and availability by dynamically matching pod count to real-time demand.

Circuit Breaker Pattern

A resiliency design pattern used in distributed systems to prevent cascading failures. When a downstream service fails repeatedly (detected via failed health checks or timeouts), the circuit breaker "trips" and fails fast for subsequent requests, bypassing the unhealthy service. This allows the failing system time to recover, analogous to a liveness probe restarting a stuck container.

States: Closed (normal operation), Open (requests fail immediately), Half-Open (testing recovery).
Implementation: Often implemented within a Service Mesh (like Istio or Linkerd) or API gateway.

Rolling Update

The default Kubernetes deployment strategy where new pod versions are gradually rolled out while old versions are terminated. Liveness and readiness probes are critical for a successful rolling update. The controller waits for new pods to pass their readiness probe before considering them available and continues to terminate old pods. If a new pod's liveness probe fails repeatedly during the update, the rollout is automatically halted, preventing a defective version from replacing the entire service.

Benefit: Enables zero-downtime deployments and automatic rollback on failure.
Dependency: Relies on accurate probe configuration to determine pod health during the transition.

Service Mesh

A dedicated infrastructure layer (e.g., Istio, Linkerd) that manages service-to-service communication in a microservices architecture. It enhances the basic probe system by providing advanced traffic management, observability, and security. A service mesh can implement application-layer health checks and use that data for intelligent load balancing, automatically draining traffic from unhealthy pods (similar to a readiness probe) and retrying requests on healthy instances.

Capability: Provides a unified view of service health across the entire cluster.
Synergy: Works with Kubernetes probes but adds richer telemetry and control at the network level.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.