Glossary

Liveness Probe

A Kubernetes health check that determines if a container is running and responsive, triggering a restart if the probe fails.

Get in touch Learn more

Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.

AGENTIC HEALTH CHECKS

What is a Liveness Probe?

A core mechanism for ensuring containerized applications remain responsive and can self-heal from runtime failures.

A liveness probe is a Kubernetes health check mechanism that determines if a container is running and responsive, triggering an automatic restart if the probe fails. It is a fundamental component of self-healing software systems, allowing a container orchestrator to detect and remediate a hung or dead process without human intervention. Probes are typically configured as HTTP requests, TCP socket checks, or command executions within the container.

The probe operates by periodically executing a diagnostic test against a defined health endpoint. If consecutive failures exceed a configured threshold, the kubelet terminates the container and restarts it according to the pod's restart policy. This mechanism is distinct from a readiness probe, which controls traffic flow, as a liveness probe governs the container's lifecycle to maintain application availability within the broader recursive error correction framework.

KUBERNETES HEALTH CHECK

Key Features of a Liveness Probe

A Liveness Probe is a Kubernetes health check mechanism that determines if a container is running and responsive. It is a core component of resilient, self-healing application deployment, automatically restarting containers that fail the probe.

Core Purpose: Detect Unresponsive Containers

The primary function of a liveness probe is to detect when an application inside a container has entered a broken state—such as a deadlock, infinite loop, or internal crash—where it is still running but cannot make progress or serve requests. Unlike a readiness probe, which checks if a container is ready to serve, a liveness probe checks if it should be restarted. A failed probe triggers the kubelet to kill the container, and the Pod's restartPolicy (usually Always) initiates a restart, aiming to restore service automatically.

Probe Types & Configuration

Liveness probes can be configured using one of three handlers, defined in the container's spec within the Pod manifest:

HTTP GET Probe: The kubelet sends an HTTP GET request to a specified path and port. A success is any HTTP status code between 200 and 399. This is ideal for web services and APIs.
TCP Socket Probe: The kubelet attempts to open a TCP connection to a specified port. Success is established if a connection can be made. Used for non-HTTP services like databases or custom TCP protocols.
Exec Probe: The kubelet executes a specified command inside the container. The probe succeeds if the command exits with status code 0. This allows for custom, application-specific health logic.

Key configuration parameters include initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold.

Integration with Pod Lifecycle

The liveness probe operates within the broader Pod lifecycle. It typically starts after an optional initialDelaySeconds, allowing the application time to bootstrap. Once active, it runs periodically based on periodSeconds. A single failure does not immediately restart the container; the probe must fail failureThreshold consecutive times. This prevents unnecessary restarts from transient issues. Upon consecutive failures, the kubelet kills the container. The Pod's restartPolicy then governs the restart. If restarts continue rapidly (controlled by Kubernetes back-off logic), the Pod may enter a CrashLoopBackOff state.

Distinction from Readiness & Startup Probes

It is critical to distinguish liveness from other Kubernetes health checks:

vs. Readiness Probe: A readiness probe determines if a container is ready to accept traffic. A failed readiness probe removes the Pod's IP from Service endpoints but does not restart the container. Use it for slow startups or temporary dependencies.
vs. Startup Probe: Used for legacy applications with long initialization times. It disables liveness and readiness checks until it succeeds once. After that, liveness probes take over for the remainder of the container's lifecycle.

A common pattern: Use a startup probe for initial boot, a readiness probe for traffic management, and a liveness probe for crash recovery.

Design Best Practices & Anti-Patterns

Effective liveness probe design is crucial for system stability.

Best Practices:

The check should be lightweight and fast, with a low timeoutSeconds.
The endpoint or command should be internal and not depend on external dependencies (e.g., databases, downstream APIs).
Use a dedicated, low-privilege health endpoint for HTTP probes.
Set initialDelaySeconds appropriately to avoid killing slow-starting apps.

Anti-Patterns to Avoid:

Leaky Abstractions: A probe that fails due to a downstream database outage could cause unnecessary restarts of otherwise healthy application containers.
Overly Sensitive Probes: Setting a low failureThreshold or short periodSeconds can cause restart storms.
Heavy Computational Logic: An exec probe that runs a complex script can consume significant CPU, affecting application performance.

Role in Self-Healing Systems

The liveness probe is a foundational reactive mechanism for self-healing software. It enables an application to automatically recover from certain internal software faults without human operator intervention, increasing overall system availability. This aligns with the Recursive Error Correction pillar by providing a basic, automated corrective action (restart) upon detecting a failure state. For more complex autonomous agents, liveness probes act as a circuit breaker at the container level, preventing a single faulty agent process from stalling an entire system. They are a key primitive in building fault-tolerant and resilient distributed systems where manual recovery is impractical at scale.

AGENTIC HEALTH CHECKS

How a Liveness Probe Works

A liveness probe is a Kubernetes health check mechanism that determines if a container is running and responsive, triggering a restart if the probe fails.

A liveness probe is a periodic diagnostic executed by the kubelet agent on a Kubernetes node. It performs a configurable check—such as an HTTP GET request, a TCP socket connection, or a command execution inside the container—to assess if the primary application process is alive but potentially stuck or unresponsive. If the probe fails consecutively, exceeding a defined failure threshold, the kubelet kills the container and restarts it according to the pod's restartPolicy. This mechanism is a core self-healing capability in container orchestration, ensuring faulty instances are automatically recovered.

Probes are defined in a container's specification within the pod manifest. Key parameters include initialDelaySeconds (wait time before starting probes), periodSeconds (time between probes), timeoutSeconds, successThreshold, and failureThreshold. Unlike a readiness probe, which controls traffic flow, a liveness probe governs container lifecycle. It is a foundational pattern for building resilient, fault-tolerant services, acting as an automated dead man's switch for containerized processes. Misconfiguration, such as overly sensitive checks, can cause unnecessary restart loops.

KUBERNETES

Liveness Probe Types: Comparison

A comparison of the three primary mechanisms for implementing a Kubernetes Liveness Probe, detailing their operation, configuration, and trade-offs.

Probe Type	HTTP GET	TCP Socket	Exec Command
Core Mechanism	Issues an HTTP request to a specified endpoint	Attempts to open a TCP connection to a specified port	Executes a command inside the container
Success Condition	HTTP status code between 200 and 399	TCP connection is successfully established	Command exits with status code 0
Primary Use Case	Web servers, REST APIs, HTTP services	Non-HTTP services (e.g., databases, custom TCP protocols)	Custom, complex health logic not expressible via HTTP/TCP
Configuration Complexity	Low (requires endpoint path/port)	Low (requires port number)	High (requires crafting and securing a shell command)
Resource Overhead	Low (single HTTP request)	Very Low (port check)	Variable to High (depends on command; can be CPU/memory intensive)
Security Consideration	Endpoint should be internal/unprivileged	Port should be internal/firewalled	High risk; command runs with container privileges; avoid shell injection
Failure Granularity	Specific HTTP error code may be returned	Binary (connection succeeds or fails)	Custom exit code and stderr output available for debugging
Recommended Initial Delay	5-30 seconds	5-30 seconds	30+ seconds (if command is resource-heavy)

KUBERNETES HEALTH CHECKS

Frequently Asked Questions

A liveness probe is a core Kubernetes health check mechanism that determines if a container is running and responsive. This section answers common technical questions about its configuration, behavior, and role in resilient system design.

A liveness probe is a Kubernetes health check that determines if a container is running and responsive, triggering a restart if the probe fails. It is a diagnostic mechanism that periodically executes a test—such as an HTTP GET request, a TCP socket connection, or a command execution inside the container—to assess the application's basic operational state. Unlike a readiness probe, which gates traffic, a liveness probe's sole purpose is to identify and recover from a "dead" or unresponsive container by forcing the kubelet to kill and restart the Pod. This automated recovery is a foundational pattern for building self-healing, resilient applications within a container orchestration platform.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

AGENTIC HEALTH CHECKS

Related Terms

A liveness probe is a foundational health check in container orchestration. The following terms represent related concepts in the broader ecosystem of automated diagnostics and resilient system design.

Readiness Probe

A Kubernetes health check that determines if a container is fully initialized and ready to accept network traffic. Unlike a liveness probe, which restarts a failing container, a readiness probe removes the container's Pod from Service load balancers until it passes. This prevents traffic from being sent to a container that is running but not yet ready to serve requests, such as one still loading a large dataset or establishing database connections.

Purpose: Ensure traffic is only routed to ready endpoints.
Action on Failure: Pod is marked 'Not Ready' and removed from service endpoints.
Common Checks: Application startup completion, dependency connectivity (e.g., database, cache).

EXPLORE

Startup Probe

A Kubernetes health check used for legacy applications with slow startup times. It disables the activity of liveness and readiness probes until the startup probe succeeds, preventing the kubelet from killing the container before it has a chance to become operational. This is crucial for applications that may take minutes to initialize, where standard periodic probes would incorrectly signal a failure.

Purpose: Protect slow-starting containers from premature restart.
Action on Failure: Container is restarted (if configured).
Typical Use Case: Java applications with long JVM warm-up periods or monolithic apps loading large configurations.

EXPLORE

Circuit Breaker

A software design pattern that detects failures and prevents an application from repeatedly trying to execute an operation that's likely to fail. It acts as a proxy for operations that can fail, moving between Closed, Open, and Half-Open states. This pattern provides stability and prevents cascading failures in distributed systems, complementing health checks by offering fast failure rather than waiting for a timeout.

Closed State: Requests flow normally; failures are counted.
Open State: Requests fail immediately without attempting the operation.
Half-Open State: A limited number of test requests are allowed to see if the underlying fault is resolved.

Dead Man's Switch

A safety mechanism that requires a periodic signal or 'heartbeat' to confirm a system or process is operational. If the expected signal is not received within a defined timeout, the system assumes a failure and triggers a corrective action, such as a failover, shutdown, or alert. This is a broader conceptual analog to a liveness probe, often implemented at the application or infrastructure level rather than the container level.

Mechanism: Periodic 'I am alive' signals from the monitored entity.
Corrective Action: Executes a predefined safety procedure (e.g., restart, notify, switch to backup).
Example: A cloud VM sending heartbeats to a monitoring service; missing heartbeats trigger an auto-scaling group replacement.

Health Endpoint

A dedicated URL (e.g., /health or /status) exposed by a service that returns a standardized HTTP status code and payload indicating its operational health. This endpoint is the target for probes from orchestrators like Kubernetes, load balancers, and monitoring tools. A robust health endpoint performs dependency checks (database, APIs) and returns detailed component status.

Standard Response: HTTP 200 OK for healthy, 5xx for unhealthy.
Payload: Often JSON detailing status of subcomponents (e.g., {"db": "ok", "cache": "degraded"}).
Implementation: Can check internal state, connection pools, and free disk space.

Watchdog Timer

A hardware or software timer that must be periodically reset by a main program to prove it is not stuck in a hang or infinite loop. If the timer expires (is not 'petted'), it triggers a system reset or a predefined recovery action. This is a low-level, time-based fault detection mechanism, analogous to a liveness probe but typically operating at the OS or firmware level to recover from catastrophic stalls.

Implementation: Can be a hardware chip or a kernel daemon.
Reset Action: Often called 'kicking' or 'petting' the watchdog.
Use Case: Critical embedded systems, IoT devices, and servers where unresponsive states must be automatically cleared.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Liveness Probe

What is a Liveness Probe?

Key Features of a Liveness Probe

Core Purpose: Detect Unresponsive Containers

Probe Types & Configuration

Integration with Pod Lifecycle

Distinction from Readiness & Startup Probes

Design Best Practices & Anti-Patterns

Role in Self-Healing Systems

How a Liveness Probe Works

Liveness Probe Types: Comparison

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Readiness Probe

Startup Probe

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there