In modern distributed systems and microservices architectures, a health check is a fundamental mechanism for ensuring service reliability and high availability. It is a simple request, often an HTTP GET to a dedicated /health endpoint, sent at regular intervals by an orchestrator like Kubernetes or a load balancer. The instance must respond within a timeout period with a successful status code (e.g., HTTP 200) to be considered healthy. Failure to respond correctly triggers automated remediation, such as restarting the container or removing the instance from the traffic pool, preventing user requests from being routed to a faulty backend.
Glossary
Health Check

What is a Health Check?
A health check is a periodic diagnostic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept user traffic.
Health checks are categorized by their purpose. A liveness probe determines if a container is running; if it fails, the instance is restarted. A readiness probe assesses if the instance is fully initialized and ready to serve requests; failure removes it from service endpoints. For stateful services like databases, a health check may verify connection pools. This proactive monitoring is a cornerstone of resilient system design, enabling auto-scaling, zero-downtime deployments, and self-healing infrastructure without manual intervention.
Key Characteristics of Health Checks
A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic. Its design and configuration are critical for system resilience and automated operations.
Probe Types and Mechanisms
Health checks are executed via different probe mechanisms, each serving a distinct operational purpose.
- HTTP/HTTPS GET Probe: The most common type. The orchestrator sends an HTTP request to a specified endpoint (e.g.,
/health). A response with a 2xx or 3xx status code indicates health; 4xx/5xx codes indicate failure. - TCP Socket Probe: The orchestrator attempts to open a TCP connection to a specified port. Success is based on establishing a connection, regardless of application-layer logic.
- Command Execution Probe: The orchestrator executes a command inside the container (e.g., a script). A zero exit code indicates success. This is highly flexible but adds complexity.
- gRPC Health Checking Protocol: A standard for gRPC services where the server implements a defined
Healthservice, allowing for precise, service-specific health reporting.
Liveness vs. Readiness
In platforms like Kubernetes, health checks are categorized into two fundamental types that control different lifecycle events.
- Liveness Probe: Answers "Is the process alive?" If this probe fails, the orchestrator assumes the application is in a broken state (e.g., deadlocked) and restarts the container to try to recover it.
- Readiness Probe: Answers "Is the application ready to serve traffic?" If this probe fails, the orchestrator removes the pod's IP address from service endpoints, stopping new traffic from being sent to it. This is used during startup (e.g., waiting for a database connection) or during temporary overload.
- Startup Probe: A third type used for legacy applications with slow startup times. It disables liveness and readiness checks until it succeeds, preventing premature restarts.
Configuration Parameters
The behavior of a health check is finely tuned through a set of timing parameters that balance responsiveness with stability.
- Initial Delay Seconds: The wait time after the container starts before probes are initiated. Crucial for allowing application bootstrap.
- Period Seconds: How often (e.g., every 10 seconds) the probe is executed.
- Timeout Seconds: The time after which a probe is considered failed if no response is received. Must be less than the period.
- Success/Failure Threshold: The number of consecutive successes or failures required for the probe to flip its verdict. A failure threshold greater than 1 prevents transient network blips from causing unnecessary restarts or traffic removal.
Integration with Load Balancers
Beyond container orchestrators, health checks are a foundational feature of load balancers (Application, Network, Gateway) and service meshes.
- Purpose: To dynamically update the pool of healthy backend targets. An instance failing its health check is automatically drained of new connections and removed from the rotation.
- Advanced Checks: Cloud load balancers often support more sophisticated checks, including validating response body content, checking TLS certificates, or monitoring for specific HTTP headers.
- Draining/Connection Draining: When an instance is marked unhealthy, existing connections may be allowed to complete during a grace period before termination, enabling graceful shutdowns during deployments or scaling-in events.
Designing the Health Endpoint
The endpoint that services the health check request (e.g., /health or /ready) must be carefully engineered.
- Depth of Check: Ranges from shallow (process is up) to deep (all critical downstream dependencies like databases, caches, and internal services are responsive). Deep checks must be fast and have their own timeouts to avoid cascading failures.
- Security: The endpoint should typically be exposed on an internal port or network, not to the public internet, to prevent denial-of-service or information disclosure.
- Statelessness: The health check should not modify application state or trigger significant side effects.
- Performance Impact: Frequent, deep health checks can add load. Caching results internally for a short period (e.g., 1 second) is a common optimization.
Failure Modes and Cascading Risks
Misconfigured health checks can themselves become a source of system instability.
- Noisy Neighbor: A deep health check that calls a overloaded downstream service can exacerbate its load, causing a cascading failure.
- Flapping: An instance repeatedly passing and failing health checks due to aggressive timeouts or threshold settings, causing constant traffic churn and restarts.
- Slow Startup Catastrophe: If the initial delay is too short, a slow-starting application will be killed by its liveness probe before it can become ready, entering a crash loop.
- Zombie Instances: An instance that passes a shallow health check but is functionally broken (e.g., its logic is corrupted) will continue to receive traffic and serve errors. This highlights the need for appropriate check depth and complementary external synthetic monitoring.
How Health Checks Work
A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic.
Health checks are automated, periodic probes sent from a load balancer or orchestrator (like Kubernetes) to an application endpoint. These probes verify the instance's operational status. A successful response, typically an HTTP 200 status code, signals the instance is healthy and can receive user traffic. If an instance fails consecutive checks, it is automatically marked unhealthy and removed from the service pool, preventing requests from being routed to a faulty component. This mechanism is fundamental to maintaining high availability and enabling zero-downtime deployments.
In Kubernetes, health checks are implemented via liveness and readiness probes. A liveness probe determines if a container needs restarting, while a readiness probe checks if a pod can accept traffic. Probes can execute HTTP requests, run shell commands, or open TCP sockets. For LLM endpoints, a health check might verify the model is loaded and the inference server is responsive. This continuous validation allows systems to perform auto-scaling, rolling updates, and failover automatically, ensuring resilient application delivery without manual intervention.
Liveness vs. Readiness Probes
A comparison of the two primary health check mechanisms used by Kubernetes to manage container lifecycle and traffic routing.
| Feature | Liveness Probe | Readiness Probe |
|---|---|---|
Primary Purpose | Determines if the container is running. A failure triggers a container restart. | Determines if the container is ready to accept traffic. A failure removes the pod from service endpoints. |
Failure Action | The kubelet kills and restarts the container. | The kubelet stops routing traffic to the pod; the container is not restarted. |
Typical Use Case | Detect a deadlock or stalled application where the process is running but unresponsive. | Wait for dependencies (databases, caches, APIs) to become available during startup or after a transient failure. |
Common Check Types | HTTP GET, TCP Socket, Exec command | HTTP GET, TCP Socket, Exec command |
Initial Delay | Often configured (e.g., 30 seconds) to allow for application boot time. | Often configured (e.g., 5 seconds) but may be longer for slow-starting apps. |
Periodicity | Continuously runs after the initial delay (e.g., every 10 seconds). | Continuously runs after the initial delay (e.g., every 5 seconds). |
Impact on Rollouts | A failing liveness probe during an update can cause repeated restarts, potentially stalling the rollout. | A failing readiness probe prevents new pods from receiving traffic, allowing a slow-starting version to initialize without user errors. |
Configuration Example |
|
|
Common Health Check Implementations
Health checks are implemented through various probes and endpoints that test different aspects of an application's operational state. The following patterns are foundational for ensuring service reliability in modern, distributed architectures.
HTTP Endpoint Probe
The most common implementation, where a load balancer or orchestrator makes a periodic HTTP GET request to a designated endpoint (e.g., /health or /status) on the application. A successful response (typically HTTP 200 OK) indicates the instance is healthy.
- Key Components: Lightweight endpoint, fast response time (< 100ms), minimal dependencies.
- Best Practice: The endpoint should perform a shallow check of critical internal dependencies (e.g., database connection pool, in-memory cache) but avoid deep, expensive queries.
- Example: A web service's
/healthendpoint checks if its connection to a primary database is alive and returns{"status": "UP"}.
TCP Socket Probe
A lower-level check where the orchestrator attempts to open a TCP connection to the application's designated port (e.g., 8080). Success is determined by the ability to establish a connection, not by application logic.
- Use Case: Ideal for non-HTTP services (e.g., databases, gRPC servers, custom TCP protocols) or for verifying the network stack and process are listening before application initialization is complete.
- Limitation: Only confirms the process is listening on a port, not that the application logic is functioning correctly. Often used alongside a readiness probe.
Command Execution Probe
The orchestrator executes a specified command inside the application container or host. A zero exit code indicates success. This offers maximum flexibility for custom health logic.
- Kubernetes Example: A
livenessProbeusingexeccould run a script that validates internal file locks or process states. - Consideration: The command must be lightweight; a hanging or resource-intensive command can cause false failure signals. It is commonly used for legacy applications without built-in HTTP endpoints.
Readiness vs. Liveness Probes
A critical distinction in platforms like Kubernetes that defines an application's lifecycle stage.
- Liveness Probe: Answers "Is the application running?" A failed liveness probe causes the container runtime to restart the pod. It diagnoses a deadlocked or otherwise broken application.
- Readiness Probe: Answers "Is the application ready to serve traffic?" A failed readiness probe removes the pod from service load balancers. It handles startup delays, temporary dependency unavailability, or maintenance modes.
Best Practice: Use both. A service might be live (process running) but not ready (waiting for cache warm-up).
Dependency Health Aggregation
An advanced pattern where a service's health endpoint aggregates the status of its downstream dependencies (databases, APIs, caches). This provides a holistic view of the service's operational capacity.
- Implementation: The
/healthendpoint performs concurrent, time-bound checks to each critical dependency, categorizing results (e.g.,UP,DOWN,DEGRADED). - Output: A composite status, often following a fail-fast principle: if a primary database is down, overall status is
DOWN. For secondary caches, status might beDEGRADED. - Tooling: Frameworks like Spring Boot Actuator, Micrometer, or custom libraries standardize this aggregation.
Synthetic Transaction Probe
A proactive check that executes a simplified but representative user transaction to validate full business logic pathways. This goes beyond connectivity to test functional correctness.
- Example: A health check for a payment service might create a test cart, call a payment API with a test credit card, and validate a mock successful response, then clean up all test data.
- Complexity: These checks are more resource-intensive and slower, so they are run less frequently (e.g., every 30 seconds vs. every 5 seconds for an endpoint probe).
- Benefit: Catches logical failures, such as misconfigured API keys, broken code deploys, or corrupted data schemas, that shallow checks miss.
Frequently Asked Questions
A health check is a periodic test performed by an orchestrator or load balancer to verify that an application instance is running correctly and ready to accept traffic. This glossary answers common questions about their implementation, types, and role in modern deployment strategies.
A health check is a periodic, automated request sent by an orchestrator (like Kubernetes) or load balancer to an application instance to verify its operational status and readiness to serve traffic. The instance responds with a success or failure code; a failure typically triggers automatic remediation, such as restarting the instance or removing it from the traffic pool. This mechanism is foundational for maintaining high availability and enabling zero-downtime deployments by ensuring only healthy pods or containers receive user requests.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
A health check is a fundamental component of a resilient deployment architecture. These related concepts define the broader ecosystem of strategies and tools used to manage traffic, ensure availability, and validate releases.
Auto-Scaling
A cloud capability that automatically adjusts the number of compute resources (e.g., virtual machines, containers) based on observed metrics like CPU utilization or request queue length. Health checks are critical for scaling decisions.
- Scale-In: Unhealthy instances are terminated first during a scale-down event.
- Integration: Cloud providers' auto-scaling groups (AWS) or Kubernetes' Horizontal Pod Autoscaler (HPA) rely on health status to ensure only healthy pods are counted toward the desired replica count.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us