Inferensys

Glossary

Liveness Probe

A liveness probe is a Kubernetes mechanism that periodically checks if a container is running. If the probe fails, the kubelet restarts the container to maintain application health.
Developer building retrieval augmentation on laptop, document chunks and embeddings visualized, technical workspace.
KUBERNETES DEPLOYMENT

What is a Liveness Probe?

A liveness probe is a Kubernetes health check that determines if a container is running. If it fails, the container is automatically restarted.

A liveness probe is a Kubernetes mechanism where the kubelet periodically executes a diagnostic check—such as an HTTP request, TCP socket connection, or command execution—against a running container. Its sole purpose is to determine if the application process inside the container is alive but potentially stuck in a deadlock or unresponsive state. If the probe fails repeatedly, Kubernetes assumes the application is unhealthy and automatically terminates the container, triggering a restart according to the pod's restartPolicy. This is a core self-healing capability for maintaining application availability.

Liveness probes are distinct from readiness probes, which determine if a container is ready to accept traffic. A failed liveness probe triggers a restart, while a failed readiness probe only removes the pod from service endpoints. Correctly configuring liveness probes is critical; overly sensitive checks can cause unnecessary restart loops, while overly lenient ones may leave failed containers running. They are a foundational component of traffic and deployment strategies, ensuring that user requests are only routed to truly healthy application instances.

KUBERNETES MECHANISM

Key Features of Liveness Probes

Liveness probes are a core Kubernetes health check mechanism used to determine if a container is running. If a probe fails, the kubelet restarts the container to attempt recovery.

01

Core Purpose: Container Health Monitoring

A liveness probe continuously monitors the operational state of a container within a pod. Its primary function is to detect when an application has entered a deadlocked or unresponsive state—where it is running but cannot make progress. Unlike a readiness probe, which governs network traffic, a failed liveness probe triggers a container restart by the kubelet. This automatic recovery mechanism is essential for maintaining application availability without manual intervention.

  • Key Distinction: Liveness = "Is the app working?" (restart if not). Readiness = "Is the app ready for traffic?" (stop sending traffic if not).
  • Use Case: A web server that is running but returning HTTP 500 errors for all requests should be restarted.
02

Probe Types & Configuration

Kubernetes supports three primary mechanisms for executing a liveness check, each defined in the container's specification within the pod manifest.

  • HTTP GET Probe: The kubelet sends an HTTP GET request to a specified path and port on the container. A response code between 200 and 399 indicates success. This is the most common probe for web services.
  • TCP Socket Probe: The kubelet attempts to open a TCP connection to a specified port on the container. Success is defined by establishing a connection. Ideal for non-HTTP services like databases.
  • Exec Probe: The kubelet executes a specified command inside the container. A zero exit code indicates success. Used for custom, application-specific health logic.

Probes are configured with parameters like initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold to fine-tune sensitivity.

03

Interaction with Pod Lifecycle

The liveness probe operates within the broader pod lifecycle. When a probe fails consistently (meeting the failureThreshold), the kubelet kills the container, triggering the pod's restartPolicy. For pods with restartPolicy: Always (the default for Deployments), the container is restarted on the same node. The pod's status changes briefly to CrashLoopBackOff if restarts fail repeatedly, with exponentially increasing delay between restart attempts.

Critical Consideration: A poorly configured probe (e.g., with an insufficient initialDelaySeconds) can cause a restart loop, where the container is killed repeatedly as it starts up, preventing the application from ever becoming ready.

04

Strategic Design & Anti-Patterns

Effective liveness probe design is crucial for system stability. The probe should check a lightweight, internal endpoint that tests the core functionality without dependencies on external services like databases or downstream APIs. A failure should indicate that the container itself is broken and requires a restart.

Common Anti-Patterns to Avoid:

  • Leaky Abstraction: Making the liveness probe dependent on external services. If a database fails, all application containers restart, amplifying the outage.
  • Overly Sensitive Checks: Using a check that fails for transient, self-correcting issues, causing unnecessary restarts and churn.
  • Resource-Intensive Checks: Using an exec probe that consumes significant CPU, affecting application performance.

The best practice is to implement a dedicated, simple /healthz or /livez endpoint for the liveness probe.

05

Differentiation from Readiness Probes

While both are health checks, liveness and readiness probes serve distinct purposes and trigger different Kubernetes actions. Understanding this distinction is fundamental for reliable deployments.

AspectLiveness ProbeReadiness Probe
GoalDetermine if container needs restart.Determine if container can receive traffic.
Action on FailureKubelet restarts the container.Kubelet removes the pod's IP from all Service endpoints.
Typical UseRecover from deadlocks, unresponsive apps.Handle slow startup, temporary unavailability (e.g., loading cache).
ImpactAffects container lifecycle on its node.Affects network traffic routing via the Service.

A pod often uses both: a readiness probe to manage traffic during large startup routines, and a liveness probe to catch runtime hangs.

06

Related Observability & Deployment Concepts

Liveness probes are one component of a holistic strategy for application resilience and progressive delivery. They work in concert with other Kubernetes features and deployment patterns.

  • Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on metrics. Unhealthy pods (failing liveness) are replaced, affecting scaling behavior.
  • Rolling Updates: During a deployment update, new pods must pass their readiness probes to be added to the Service. Liveness probes ensure the new versions remain healthy after the cutover.
  • Pod Disruption Budgets (PDBs): Limit voluntary disruptions. A pod killed by a failing liveness probe is an involuntary disruption and does not count against the PDB.
  • Service Mesh Health Checks: A service mesh (e.g., Istio) can implement its own application-layer health checks, but the kubelet's liveness probe remains the primary mechanism for container lifecycle management.
PROTOCOL COMPARISON

Liveness Probe Types: HTTP, TCP, and Exec

A comparison of the three primary mechanisms Kubernetes provides to check if a container is alive, detailing their operation, configuration, and typical use cases.

Feature / CharacteristicHTTP GET ProbeTCP Socket ProbeExec Probe

Core Mechanism

Issues an HTTP GET request to a specified path and port.

Attempts to open a TCP connection to a specified container port.

Executes a specified command inside the container.

Success Criteria

HTTP status code between 200 and 399 inclusive.

TCP connection is successfully established.

Command exits with status code 0.

Primary Use Case

Web servers, REST APIs, HTTP-based services.

Non-HTTP services (databases, memcached, custom TCP servers).

Complex, custom health logic not covered by network checks.

Configuration Complexity

Low. Requires path, port, and optional HTTP headers.

Low. Requires only port number.

High. Requires crafting a precise shell command.

Resource Overhead

Low (network call).

Very Low (simple socket open).

High (spawns a new process per check).

Security Consideration

Exposes a health endpoint. Should be internal/unprivileged.

Exposes a port. Should be firewalled from public access.

Runs arbitrary commands. High risk if not carefully audited.

Recommended For

Standard web applications.

Stateful services and legacy protocols.

Last resort for unique initialization sequences.

Failure Action

Kubelet kills and restarts the container.

Kubelet kills and restarts the container.

Kubelet kills and restarts the container.

LIVENESS PROBE

Critical Configuration Parameters

A Liveness Probe is a Kubernetes health check that determines if a container is running. If it fails, the kubelet restarts the container. Configuring it correctly is critical for application resilience.

01

Probe Type: HTTP GET

The most common probe type. Kubernetes sends an HTTP GET request to a specified path and port on the container. A successful response (HTTP status code between 200 and 399) indicates the container is alive.

  • Configuration Example: path: /healthz, port: 8080
  • Use Case: Ideal for web servers and REST APIs where a dedicated health endpoint can be implemented.
  • Failure Consequence: A non-2xx/3xx status code or timeout triggers a container restart.
02

Probe Type: TCP Socket

Kubernetes attempts to open a TCP connection to a specified port on the container. Success is based solely on whether a connection can be established.

  • Configuration Example: port: 9300
  • Use Case: Essential for non-HTTP services like databases (e.g., PostgreSQL, Redis), gRPC servers, or custom TCP-based protocols.
  • Key Consideration: Less granular than HTTP; confirms the process is listening but not necessarily functional.
03

Probe Type: Exec Command

Kubernetes executes a specified command inside the container. A zero exit code indicates liveness.

  • Configuration Example: command: ["pg_isready", "-U", "postgres"]
  • Use Case: For complex health checks that require custom logic, such as checking internal application state or verifying a dependent process.
  • Caution: Adds overhead. The command must be lightweight and available within the container image.
04

Timing Parameters

Fine-tuning these parameters prevents unnecessary restarts and allows applications adequate time to start.

  • initialDelaySeconds: Wait time after container starts before initiating probes. Critical to avoid killing slow-starting apps.
  • periodSeconds: How often to perform the probe (e.g., every 10 seconds).
  • timeoutSeconds: Time after which a probe is considered failed if no response.
  • successThreshold: Consecutive successes required to mark a failed container as alive (defaults to 1 for liveness).
  • failureThreshold: Number of consecutive probe failures before restarting the container.
05

Liveness vs. Readiness Probe

These are distinct but complementary mechanisms.

  • Liveness Probe: Answers "Is the container running?" Failure results in a container restart.
  • Readiness Probe: Answers "Is the container ready to serve traffic?" Failure removes the pod from Service endpoints but does not restart it.

Key Difference: A pod can be alive (process running) but not ready (still loading data, warming up caches). Misconfiguring liveness to check readiness can cause restart loops.

06

Anti-Patterns & Best Practices

Poor configuration can cause instability.

Anti-Patterns to Avoid:

  • Using the same check for liveness and readiness.
  • Setting a liveness check on an endpoint with external dependencies (e.g., database). A downstream failure shouldn't crash your app.
  • Too-short initialDelaySeconds or timeoutSeconds.

Best Practices:

  • Liveness checks should be low-cost, self-contained, and idempotent.
  • Always implement a dedicated, internal health endpoint for HTTP checks.
  • Use readiness gates for complex startup dependencies before enabling liveness probes.
KUBERNETES

Frequently Asked Questions

A liveness probe is a core Kubernetes health-check mechanism. These questions address its purpose, configuration, and role in ensuring application resilience.

A liveness probe is a Kubernetes health-check mechanism that determines if a container is running. If the probe fails, the kubelet terminates the container and restarts it according to the pod's restartPolicy. Its primary function is to recover applications that have entered a broken state, such as a deadlock, where the process is running but unable to make progress.

Probes are defined in the container specification of a pod's manifest and can be one of three types:

  • HTTP GET probe: Succeeds if a request to a specified endpoint returns a success status code (2xx or 3xx).
  • TCP Socket probe: Succeeds if a TCP connection can be established on a specified port.
  • Exec probe: Succeeds if a command executed inside the container exits with status code 0.

Unlike a readiness probe, which controls traffic flow, a liveness probe governs container lifecycle, acting as a last-resort recovery mechanism for unrecoverable internal errors.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.