Inferensys

Glossary

Golden Signals

Golden Signals are four key metrics—latency, traffic, errors, and saturation—used to monitor and assess the health and performance of a distributed service or application at a high level.
Performance engineer optimizing AI latency on laptop, latency charts visible, technical optimization session.
ORCHESTRATION OBSERVABILITY

What is Golden Signals?

The Golden Signals are a foundational set of four high-level metrics used to monitor the health and performance of any distributed service or system, including multi-agent orchestrations.

The Golden Signals are four key metrics—latency, traffic, errors, and saturation—that provide a comprehensive, high-level view of a distributed system's health and performance. Originally popularized by Google's Site Reliability Engineering (SRE) practices, they answer the essential questions of how fast a system is, how much demand it's handling, the rate of failures, and how full its resources are. In the context of multi-agent system orchestration, these signals are critical for observing the collective behavior of interacting autonomous agents, moving beyond individual agent metrics to understand the health of the entire workflow.

Applying the Golden Signals to an agentic system involves specific interpretations: latency measures the time for an agent to complete a task or for a full orchestrated workflow; traffic quantifies the rate of inter-agent messages or task requests; errors track failed agent executions, API calls, or invalid outputs; and saturation monitors the utilization of shared resources like vector database throughput or GPU memory. This framework enables platform engineers and DevOps teams to set meaningful Service Level Objectives (SLOs) and implement effective alerting rules to ensure system reliability and performance.

ORCHESTRATION OBSERVABILITY

The Four Golden Signals Explained

The Golden Signals are four key metrics—latency, traffic, errors, and saturation—used to monitor and assess the health and performance of a distributed service or application at a high level. In multi-agent orchestration, they provide a holistic view of system behavior, agent workload, and potential bottlenecks.

01

Latency

Latency measures the time required to service a request, typically expressed as a duration (e.g., milliseconds). It is the primary indicator of user-perceived performance.

  • Types: Distinguish between success latency (time for successful requests) and failure latency (time for failed requests, which can be very fast).
  • Monitoring: Track percentiles (p50, p95, p99) rather than averages to understand tail latency and outliers.
  • Agent Context: In a multi-agent system, measure the end-to-end latency of a user query as it flows through the orchestration workflow, as well as the internal latency of individual agent reasoning and tool calls.
02

Traffic

Traffic quantifies the demand placed on a system, measuring how much work it is doing. It is a direct indicator of usage and load.

  • Metrics: Common measures include requests per second (RPS), transactions per second (TPS), or the volume of data processed.
  • Agent Context: For orchestration, traffic can be measured as the number of tasks dispatched per second, messages exchanged between agents, or the rate of agent invocations. A sudden drop in traffic can indicate a failure in upstream systems or agent registration/discovery issues.
03

Errors

Errors track the rate of requests that fail, either explicitly (e.g., HTTP 5xx status codes) or implicitly (e.g., returning incorrect or partial results).

  • Explicit vs. Implicit: Monitor both hard failures (timeouts, crashes) and soft failures (business logic errors, incorrect agent outputs).
  • Agent Context: In agentic systems, errors include failed tool executions, LLM provider API errors, agent reasoning failures, protocol violations, and deadlock scenarios in coordination. Errors should be tracked per agent type and per workflow step.
04

Saturation

Saturation measures how "full" a system resource is, indicating the pressure on a constrained resource before performance degrades. It is a forward-looking signal of impending failure.

  • Resources: Common saturation points include CPU load, memory utilization, I/O throughput, and queue depths.
  • Agent Context: For orchestration, monitor the saturation of shared resources like vector database connections, LLM API rate limits, message broker queues, and the concurrent task capacity of individual agents. High saturation often precedes increased latency and errors.
05

Applying Signals to Agent Orchestration

In a multi-agent system, the Golden Signals must be interpreted through the lens of distributed, concurrent workflows.

  • Latency: Trace a single user request across the agent call graph to identify slow agents or communication bottlenecks.
  • Traffic: Correlate spikes in task initiation with downstream saturation of specialized agents (e.g., a research agent).
  • Errors: Use structured logging to tag errors with the specific agent, tool, and workflow ID for root cause analysis.
  • Saturation: Monitor the queue depth in the orchestration workflow engine; a growing backlog indicates the system cannot keep up with incoming traffic.
06

Beyond the Basics: SLOs & Error Budgets

The Golden Signals are most powerful when used to define and track Service Level Objectives (SLOs).

  • SLO Definition: An SLO might state "99% of agent workflow completions shall have a latency under 2 seconds."
  • Error Budget: The error budget is the allowable amount of SLO violation (e.g., 1% of requests can be slow). Exhausting this budget should trigger a focus on stability over new features.
  • Proactive Management: By monitoring these signals against SLOs, teams can perform canary analysis on new agent deployments or conduct chaos engineering experiments to test system resilience before incidents occur.
GOLDEN SIGNALS

Frequently Asked Questions

The Golden Signals are four key metrics used to monitor the health and performance of distributed systems, including multi-agent orchestration platforms. This FAQ addresses common questions about their definition, application, and implementation.

The Golden Signals are four high-level metrics—latency, traffic, errors, and saturation—used to monitor the health and performance of a distributed service or application. Originally popularized by Google's Site Reliability Engineering (SRE) team, they provide a comprehensive, yet concise, view of a system's operational state from the user's perspective. In the context of multi-agent system orchestration, these signals translate to monitoring the time agents take to complete tasks (latency), the volume of messages or tasks processed (traency), the rate of failed agent interactions or tool executions (errors), and the utilization of critical resources like CPU, memory, or agent pool concurrency (saturation). They serve as a first-line diagnostic tool, enabling engineers to quickly identify which broad category a performance issue belongs to before diving into deeper distributed tracing or log analysis.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.