Inferensys

Glossary

Health Probe

A health probe is a diagnostic check, such as a liveness or readiness check, used by an orchestrator to determine the operational status of a service or container.
Strategy consultant facilitating AI use case discovery workshop, sticky notes on glass wall, casual corporate meeting.
SELF-HEALING SOFTWARE SYSTEMS

What is a Health Probe?

A health probe is a diagnostic check used by an orchestrator to determine the operational status of a service or container.

A health probe is a diagnostic mechanism, such as a liveness or readiness check, used by an orchestrator (e.g., Kubernetes) to determine the operational status of a service or container. It performs periodic requests to a defined endpoint, evaluating the response against success criteria to decide if an instance is healthy and capable of receiving traffic. This enables automatic failure detection and triggers recovery actions like restarting or draining unhealthy pods, forming the foundation for resilient, self-healing software systems.

In practice, a liveness probe determines if a container needs to be restarted, while a readiness probe controls its inclusion in a service's load balancer. These probes integrate with broader fault-tolerant patterns like circuit breakers and graceful degradation. For autonomous agents, analogous agentic health checks assess logical soundness and operational readiness, ensuring the system can maintain service level objectives (SLOs) by preemptively isolating failures before they cascade.

SELF-HEALING SOFTWARE SYSTEMS

Key Characteristics of Health Probes

Health probes are the foundational diagnostic mechanism for autonomous systems, enabling orchestrators to make deterministic decisions about service availability and lifecycle management without human intervention.

01

Probe Types: Liveness vs. Readiness

Health probes are categorized by their operational purpose. A liveness probe determines if a container or process is running. A failure typically triggers a restart. A readiness probe determines if a container is ready to accept traffic (e.g., dependencies initialized, warm caches loaded). A failure prevents traffic from being sent to the pod. A third type, the startup probe, is used for legacy applications with long initialization times, disabling liveness/readiness checks until it succeeds.

02

Probe Mechanisms & Execution

Probes are executed by the orchestrator's kubelet against a container according to a defined schedule. The three primary mechanisms are:

  • HTTP GET: The most common. The kubelet sends an HTTP request to a specified path and port. A success code (200-399) passes the probe.
  • TCP Socket: The kubelet attempts to open a TCP connection to a specified port. Success is established if a connection is made.
  • Exec Command: The kubelet executes a specified command inside the container. A zero exit code indicates success.
03

Configuration Parameters for Resilience

Probe behavior is finely tuned via parameters to balance responsiveness with stability, preventing flapping (rapid, cyclical failures). Key parameters include:

  • initialDelaySeconds: Wait time after container start before initiating probes.
  • periodSeconds: How often to perform the probe.
  • timeoutSeconds: Number of seconds after which the probe times out.
  • successThreshold: Minimum consecutive successes for the probe to be considered successful after a failure.
  • failureThreshold: Number of consecutive failures required for the probe to be considered failed.
04

Integration with Orchestrator Lifecycle

Probes are integral to the container orchestrator's control loops. In Kubernetes, probe results directly inform the decisions of core controllers:

  • The kubelet uses liveness probes to decide when to restart a container.
  • The kubelet uses readiness probes to add or remove a pod's IP from the endpoints list of a matching Service.
  • The Deployment controller considers pod readiness during rolling updates, ensuring new pods are ready before scaling down old ones. This creates a deterministic, self-healing feedback loop.
05

Designing Effective Probe Endpoints

A well-designed probe endpoint is lightweight, stateless, and checks critical internal dependencies. Best practices include:

  • Checking internal in-memory state or a local cache.
  • Performing a shallow check on a crucial downstream dependency (e.g., database connection pool).
  • Avoiding deep dependency checks that cascade failures or heavy computational logic that consumes significant resources. The endpoint should return quickly to avoid blocking the orchestrator's control loop.
06

Relation to Circuit Breakers & Observability

Health probes operate at the infrastructure layer, while patterns like the Circuit Breaker operate at the application layer. A circuit breaker trips based on business logic failure rates, while a readiness probe fails on a technical health check. Together, they provide layered fault tolerance. Probe metrics (success/failure counts, latency) are critical observability signals, feeding into dashboards and alerts to provide a real-time view of system resilience and the effectiveness of self-healing mechanisms.

SELF-HEALING SOFTWARE SYSTEMS

How Health Probes Work

A health probe is a diagnostic check used by an orchestrator to determine the operational status of a service or container, enabling autonomous failure detection and recovery.

A health probe is a diagnostic check, such as a liveness or readiness check, used by an orchestrator to determine the operational status of a service or container. It functions as the primary feedback mechanism for self-healing software systems, allowing platforms like Kubernetes to automatically restart, terminate, or route traffic away from unhealthy instances. This creates a closed-loop system where the platform's state is continuously reconciled with a declared desired state.

Probes execute by periodically making a request—such as an HTTP call, TCP socket connection, or command execution—to a predefined endpoint within the application. Based on the response (success, failure, or timeout), the orchestrator takes corrective execution path adjustment. For example, a failed liveness probe triggers a container restart, while a failed readiness probe removes the pod from service load balancers, enabling graceful degradation and preventing cascading failures.

KUBERNETES HEALTH CHECKS

Liveness vs. Readiness Probes: A Comparison

A detailed comparison of the two primary health probe types used by container orchestrators like Kubernetes to manage container lifecycle and traffic routing.

Probe FeatureLiveness ProbeReadiness Probe

Primary Purpose

Determine if the container process is alive and running. A failure triggers a container restart.

Determine if the container is ready to accept network traffic (e.g., HTTP requests). A failure removes the pod from service endpoints.

Failure Action

The kubelet kills the container and restarts it according to the pod's restartPolicy.

The kubelet stops routing traffic to the pod. The pod's IP address is removed from the endpoints of all matching Services.

Typical Check Logic

A simple check that the main process is responsive (e.g., a basic TCP connection, HTTP request to a non-critical endpoint).

A check that all dependencies are initialized and ready (e.g., database connections are live, cache is warmed, large files are loaded).

Probe Timing

Starts after initialDelaySeconds. Runs continuously for the container's lifetime.

Starts after initialDelaySeconds. Runs continuously for the container's lifetime.

Configuration Parameters (e.g., in Kubernetes)

initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, failureThreshold

initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, failureThreshold

Impact on System State

Stateful. A restart resets in-memory state and terminates existing connections.

Stateless. No container restart; existing in-flight requests may complete if the pod is not terminated.

Common Implementation

HTTP GET request to a /healthz endpoint, TCP socket check, or Exec command (e.g., cat /tmp/healthy).

HTTP GET request to a /ready endpoint, often with deeper dependency validation than the liveness endpoint.

Design Principle

Follows the "Let-it-Crash" philosophy. If unhealthy, restart to reach a clean state.

Enables graceful degradation and load shedding. Protects the service from traffic it cannot handle.

ARCHITECTURAL PATTERNS

Where Health Probes Are Used

Health probes are a fundamental mechanism for building resilient, self-healing systems. They are implemented across the entire software stack, from container orchestration to application logic.

HEALTH PROBE

Frequently Asked Questions

A health probe is a diagnostic mechanism used by orchestrators like Kubernetes to assess the operational status of a service instance. This glossary addresses common technical questions about their implementation, purpose, and role in self-healing architectures.

A health probe is a diagnostic check, such as a liveness or readiness probe, used by an orchestrator to determine the operational status of a service or container. It works by periodically sending a request—typically an HTTP GET, a TCP socket connection, or an executed command—to a predefined endpoint within the application. The orchestrator evaluates the response (or timeout) against configured success criteria to decide if the instance is healthy and capable of receiving traffic, or if it requires restarting or removal from the service pool.

Key Mechanism:

  • Orchestrator Initiated: The platform control plane (e.g., the kubelet in Kubernetes) executes the probe.
  • Defined Endpoint: The application must expose a specific path (e.g., /health) or port for the check.
  • Configurable Parameters: Critical settings include initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold.
  • Binary Decision: Based on the probe result, the orchestrator takes a deterministic action: keep the pod in service, restart it, or mark it as not ready.
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.