Glossary

Health Check

A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic.

Get in touch Learn more

Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.

TRAFFIC AND DEPLOYMENT STRATEGIES

What is a Health Check?

A health check is a periodic diagnostic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept user traffic.

In modern distributed systems and microservices architectures, a health check is a fundamental mechanism for ensuring service reliability and high availability. It is a simple request, often an HTTP GET to a dedicated /health endpoint, sent at regular intervals by an orchestrator like Kubernetes or a load balancer. The instance must respond within a timeout period with a successful status code (e.g., HTTP 200) to be considered healthy. Failure to respond correctly triggers automated remediation, such as restarting the container or removing the instance from the traffic pool, preventing user requests from being routed to a faulty backend.

Health checks are categorized by their purpose. A liveness probe determines if a container is running; if it fails, the instance is restarted. A readiness probe assesses if the instance is fully initialized and ready to serve requests; failure removes it from service endpoints. For stateful services like databases, a health check may verify connection pools. This proactive monitoring is a cornerstone of resilient system design, enabling auto-scaling, zero-downtime deployments, and self-healing infrastructure without manual intervention.

TRAFFIC AND DEPLOYMENT STRATEGIES

Key Characteristics of Health Checks

A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic. Its design and configuration are critical for system resilience and automated operations.

Probe Types and Mechanisms

Health checks are executed via different probe mechanisms, each serving a distinct operational purpose.

HTTP/HTTPS GET Probe: The most common type. The orchestrator sends an HTTP request to a specified endpoint (e.g., /health). A response with a 2xx or 3xx status code indicates health; 4xx/5xx codes indicate failure.
TCP Socket Probe: The orchestrator attempts to open a TCP connection to a specified port. Success is based on establishing a connection, regardless of application-layer logic.
Command Execution Probe: The orchestrator executes a command inside the container (e.g., a script). A zero exit code indicates success. This is highly flexible but adds complexity.
gRPC Health Checking Protocol: A standard for gRPC services where the server implements a defined Health service, allowing for precise, service-specific health reporting.

Liveness vs. Readiness

In platforms like Kubernetes, health checks are categorized into two fundamental types that control different lifecycle events.

Liveness Probe: Answers "Is the process alive?" If this probe fails, the orchestrator assumes the application is in a broken state (e.g., deadlocked) and restarts the container to try to recover it.
Readiness Probe: Answers "Is the application ready to serve traffic?" If this probe fails, the orchestrator removes the pod's IP address from service endpoints, stopping new traffic from being sent to it. This is used during startup (e.g., waiting for a database connection) or during temporary overload.
Startup Probe: A third type used for legacy applications with slow startup times. It disables liveness and readiness checks until it succeeds, preventing premature restarts.

Configuration Parameters

The behavior of a health check is finely tuned through a set of timing parameters that balance responsiveness with stability.

Initial Delay Seconds: The wait time after the container starts before probes are initiated. Crucial for allowing application bootstrap.
Period Seconds: How often (e.g., every 10 seconds) the probe is executed.
Timeout Seconds: The time after which a probe is considered failed if no response is received. Must be less than the period.
Success/Failure Threshold: The number of consecutive successes or failures required for the probe to flip its verdict. A failure threshold greater than 1 prevents transient network blips from causing unnecessary restarts or traffic removal.

Integration with Load Balancers

Beyond container orchestrators, health checks are a foundational feature of load balancers (Application, Network, Gateway) and service meshes.

Purpose: To dynamically update the pool of healthy backend targets. An instance failing its health check is automatically drained of new connections and removed from the rotation.
Advanced Checks: Cloud load balancers often support more sophisticated checks, including validating response body content, checking TLS certificates, or monitoring for specific HTTP headers.
Draining/Connection Draining: When an instance is marked unhealthy, existing connections may be allowed to complete during a grace period before termination, enabling graceful shutdowns during deployments or scaling-in events.

Designing the Health Endpoint

The endpoint that services the health check request (e.g., /health or /ready) must be carefully engineered.

Depth of Check: Ranges from shallow (process is up) to deep (all critical downstream dependencies like databases, caches, and internal services are responsive). Deep checks must be fast and have their own timeouts to avoid cascading failures.
Security: The endpoint should typically be exposed on an internal port or network, not to the public internet, to prevent denial-of-service or information disclosure.
Statelessness: The health check should not modify application state or trigger significant side effects.
Performance Impact: Frequent, deep health checks can add load. Caching results internally for a short period (e.g., 1 second) is a common optimization.

Failure Modes and Cascading Risks

Misconfigured health checks can themselves become a source of system instability.

Noisy Neighbor: A deep health check that calls a overloaded downstream service can exacerbate its load, causing a cascading failure.
Flapping: An instance repeatedly passing and failing health checks due to aggressive timeouts or threshold settings, causing constant traffic churn and restarts.
Slow Startup Catastrophe: If the initial delay is too short, a slow-starting application will be killed by its liveness probe before it can become ready, entering a crash loop.
Zombie Instances: An instance that passes a shallow health check but is functionally broken (e.g., its logic is corrupted) will continue to receive traffic and serve errors. This highlights the need for appropriate check depth and complementary external synthetic monitoring.

TRAFFIC AND DEPLOYMENT STRATEGIES

How Health Checks Work

A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic.

Health checks are automated, periodic probes sent from a load balancer or orchestrator (like Kubernetes) to an application endpoint. These probes verify the instance's operational status. A successful response, typically an HTTP 200 status code, signals the instance is healthy and can receive user traffic. If an instance fails consecutive checks, it is automatically marked unhealthy and removed from the service pool, preventing requests from being routed to a faulty component. This mechanism is fundamental to maintaining high availability and enabling zero-downtime deployments.

In Kubernetes, health checks are implemented via liveness and readiness probes. A liveness probe determines if a container needs restarting, while a readiness probe checks if a pod can accept traffic. Probes can execute HTTP requests, run shell commands, or open TCP sockets. For LLM endpoints, a health check might verify the model is loaded and the inference server is responsive. This continuous validation allows systems to perform auto-scaling, rolling updates, and failover automatically, ensuring resilient application delivery without manual intervention.

KUBERNETES HEALTH CHECKS

Liveness vs. Readiness Probes

A comparison of the two primary health check mechanisms used by Kubernetes to manage container lifecycle and traffic routing.

Feature	Liveness Probe	Readiness Probe
Primary Purpose	Determines if the container is running. A failure triggers a container restart.	Determines if the container is ready to accept traffic. A failure removes the pod from service endpoints.
Failure Action	The kubelet kills and restarts the container.	The kubelet stops routing traffic to the pod; the container is not restarted.
Typical Use Case	Detect a deadlock or stalled application where the process is running but unresponsive.	Wait for dependencies (databases, caches, APIs) to become available during startup or after a transient failure.
Common Check Types	HTTP GET, TCP Socket, Exec command	HTTP GET, TCP Socket, Exec command
Initial Delay	Often configured (e.g., 30 seconds) to allow for application boot time.	Often configured (e.g., 5 seconds) but may be longer for slow-starting apps.
Periodicity	Continuously runs after the initial delay (e.g., every 10 seconds).	Continuously runs after the initial delay (e.g., every 5 seconds).
Impact on Rollouts	A failing liveness probe during an update can cause repeated restarts, potentially stalling the rollout.	A failing readiness probe prevents new pods from receiving traffic, allowing a slow-starting version to initialize without user errors.
Configuration Example	`failureThreshold: 3` (3 consecutive failures to restart)	`successThreshold: 1` (1 success to mark ready after failure)

IMPLEMENTATION PATTERNS

Common Health Check Implementations

Health checks are implemented through various probes and endpoints that test different aspects of an application's operational state. The following patterns are foundational for ensuring service reliability in modern, distributed architectures.

HTTP Endpoint Probe

The most common implementation, where a load balancer or orchestrator makes a periodic HTTP GET request to a designated endpoint (e.g., /health or /status) on the application. A successful response (typically HTTP 200 OK) indicates the instance is healthy.

Key Components: Lightweight endpoint, fast response time (< 100ms), minimal dependencies.
Best Practice: The endpoint should perform a shallow check of critical internal dependencies (e.g., database connection pool, in-memory cache) but avoid deep, expensive queries.
Example: A web service's /health endpoint checks if its connection to a primary database is alive and returns {"status": "UP"}.

TCP Socket Probe

A lower-level check where the orchestrator attempts to open a TCP connection to the application's designated port (e.g., 8080). Success is determined by the ability to establish a connection, not by application logic.

Use Case: Ideal for non-HTTP services (e.g., databases, gRPC servers, custom TCP protocols) or for verifying the network stack and process are listening before application initialization is complete.
Limitation: Only confirms the process is listening on a port, not that the application logic is functioning correctly. Often used alongside a readiness probe.

Command Execution Probe

The orchestrator executes a specified command inside the application container or host. A zero exit code indicates success. This offers maximum flexibility for custom health logic.

Kubernetes Example: A livenessProbe using exec could run a script that validates internal file locks or process states.
Consideration: The command must be lightweight; a hanging or resource-intensive command can cause false failure signals. It is commonly used for legacy applications without built-in HTTP endpoints.

Readiness vs. Liveness Probes

A critical distinction in platforms like Kubernetes that defines an application's lifecycle stage.

Liveness Probe: Answers "Is the application running?" A failed liveness probe causes the container runtime to restart the pod. It diagnoses a deadlocked or otherwise broken application.
Readiness Probe: Answers "Is the application ready to serve traffic?" A failed readiness probe removes the pod from service load balancers. It handles startup delays, temporary dependency unavailability, or maintenance modes.

Best Practice: Use both. A service might be live (process running) but not ready (waiting for cache warm-up).

Dependency Health Aggregation

An advanced pattern where a service's health endpoint aggregates the status of its downstream dependencies (databases, APIs, caches). This provides a holistic view of the service's operational capacity.

Implementation: The /health endpoint performs concurrent, time-bound checks to each critical dependency, categorizing results (e.g., UP, DOWN, DEGRADED).
Output: A composite status, often following a fail-fast principle: if a primary database is down, overall status is DOWN. For secondary caches, status might be DEGRADED.
Tooling: Frameworks like Spring Boot Actuator, Micrometer, or custom libraries standardize this aggregation.

Synthetic Transaction Probe

A proactive check that executes a simplified but representative user transaction to validate full business logic pathways. This goes beyond connectivity to test functional correctness.

Example: A health check for a payment service might create a test cart, call a payment API with a test credit card, and validate a mock successful response, then clean up all test data.
Complexity: These checks are more resource-intensive and slower, so they are run less frequently (e.g., every 30 seconds vs. every 5 seconds for an endpoint probe).
Benefit: Catches logical failures, such as misconfigured API keys, broken code deploys, or corrupted data schemas, that shallow checks miss.

HEALTH CHECK

Frequently Asked Questions

A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic. This glossary answers common questions about their implementation, types, and role in modern deployment strategies.

A health check is a periodic, automated request sent by an orchestrator (like Kubernetes) or load balancer to an application instance to verify its operational status and readiness to serve traffic. The instance responds with a success or failure code; a failure typically triggers automatic remediation, such as restarting the instance or removing it from the traffic pool. This mechanism is foundational for maintaining high availability and enabling zero-downtime deployments by ensuring only healthy pods or containers receive user requests.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Health Check

What is a Health Check?