Inferensys

Glossary

Service Level Indicator (SLI)

A Service Level Indicator (SLI) is a quantitative measure of a service's behavior from the user's perspective, such as tool call latency or success rate, used to define reliability objectives for agentic systems.
Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.
AGENTIC OBSERVABILITY AND TELEMETRY

What is a Service Level Indicator (SLI)?

A quantitative measure of a service's performance or behavior from the user's perspective, used to define reliability objectives.

A Service Level Indicator (SLI) is a quantitative measure of a service's behavior from the user's perspective, such as tool call latency or success rate, used to define reliability objectives for agentic systems. It is a direct, measurable key performance indicator (KPI) for a specific aspect of service quality, like availability, latency, throughput, or error rate. For autonomous agents, common SLIs include tool call success rate, end-to-end task completion latency, and planning loop accuracy.

SLIs are foundational to Service Level Objectives (SLOs) and Error Budgets, forming the empirical basis for reliability contracts. In tool call instrumentation, SLIs are derived from telemetry like distributed traces and metrics, enabling teams to monitor dependencies and assure deterministic execution. Selecting the right SLI requires focusing on user-visible outcomes, not internal system metrics, to accurately represent the service's health and guide engineering priorities effectively.

TOOL CALL INSTRUMENTATION

Core Characteristics of an SLI

A Service Level Indicator (SLI) is a quantitative measure of a service's behavior from the user's perspective. In agentic systems, SLIs are critical for defining reliability objectives for external tool and API calls.

01

Quantitative and Measurable

An SLI must be a numerical value derived from observable data, not a subjective opinion. It is calculated from raw telemetry signals like latency histograms, HTTP status codes, or error logs.

Examples include:

  • Tool Call Latency: Measured in milliseconds from request initiation to final byte received.
  • Success Rate: Calculated as (Successful Calls / Total Calls) * 100.
  • Error Rate: The inverse of success rate, focusing on 4xx/5xx HTTP responses or thrown exceptions.

Without a precise, automated measurement, you cannot define a meaningful Service Level Objective (SLO).

02

User-Centric Perspective

An effective SLI measures what the end-user (or the agent acting on the user's behalf) actually experiences. It focuses on the external behavior of the service, not its internal health.

For tool calls, this means:

  • Measuring latency from the agent's point of view, including network time.
  • Defining 'success' based on the agent receiving a usable, correct response, not just a TCP handshake.
  • Avoiding internal metrics like CPU utilization or queue depth, which are leading indicators but not direct measures of user experience.

The core question is: 'Was the tool call fast and successful for the agent executing the task?'

03

Directly Relevant to Business Value

The chosen SLI should correlate with user satisfaction and business outcomes. Monitoring an irrelevant metric provides no actionable signal for reliability engineering.

Key considerations:

  • Latency SLIs directly impact agent task completion time and user perceived performance.
  • Success Rate SLIs determine whether an agent can complete its intended function or fails mid-execution.
  • Poor SLI selection example: Measuring 'API calls per second' when what matters is whether those calls succeed and return correct data.

SLIs should answer the question: 'What matters most to the users of this agentic system?'

04

Defined Over a Well-Understood Aggregation

An SLI is not a single measurement but an aggregated value over a specific population and time window. The aggregation method must be explicit to avoid ambiguity.

Critical aggregation parameters:

  • Time Window: 'Over the last 5 minutes', 'Daily', 'Weekly'.
  • Population: 'All POST requests to the /execute endpoint', 'Tool calls from the DataAnalysisAgent'.
  • Aggregation Function: 'Average latency', '95th percentile (P95) latency', 'Proportion of successful requests'.

For example: 'The 95th percentile latency for all get_weather tool calls measured over a 1-hour rolling window.'

05

Tied to a Specific Service Operation

An SLI should be scoped to a discrete, logical service operation that a user or agent triggers. In tool call instrumentation, this typically maps to a single API endpoint or tool function.

Implementation guidance:

  • One SLI per logical operation: calculate_invoice, fetch_customer_record, submit_order.
  • Avoid overly broad SLIs: 'Database latency' is too vague; 'Query latency for the transactions table' is actionable.
  • Use Span names and attributes from distributed tracing (e.g., OpenTelemetry) to naturally define these operational boundaries.

This scoping allows for precise alerting and debugging when the SLI breaches its target.

06

Instrumentable and Collectable

The data required to compute the SLI must be technically feasible to collect with high fidelity and minimal performance overhead. If you cannot measure it, it cannot be an SLI.

Requirements for tool calls:

  • Automatic Instrumentation: Using frameworks like OpenTelemetry to decorate tool calls with start/end timestamps and result status.
  • Low Overhead: Collection must not significantly impact the performance it's trying to measure.
  • Reliable Export: Telemetry data must be reliably shipped to a backend system (e.g., Prometheus, Datadog) for aggregation.

Common collection methods include client-side SDKs, service mesh sidecars, or API gateway logs.

TOOL CALL INSTRUMENTATION

Common SLI Examples for Agentic Systems

Quantitative measures of service behavior from the agent's perspective, used to define reliability objectives for autonomous systems.

SLI MetricDefinition & MeasurementTypical Target (SLO)Why It Matters for Agents

Tool Call Latency

Time from agent initiating a request to receiving the complete response from an external API or tool.

P95 < 500ms

Directly impacts agent's task completion time and user-perceived responsiveness. High latency can stall reasoning loops.

Tool Call Success Rate

Percentage of tool/API invocations that return a successful (non-error) result. Measured as (Successful Calls / Total Calls) * 100.

99.5%

Fundamental to agent reliability. A low success rate indicates brittle dependencies, causing agent tasks to fail or requiring complex error handling.

Planning Success Rate

Percentage of agent tasks where the initial plan or decomposition was executable without fatal logical errors. Requires semantic analysis of plans vs. outcomes.

98%

Measures the quality of the agent's high-level reasoning. A low rate indicates poor task understanding or planning capability.

Step Completion Rate

Percentage of individual steps (e.g., tool calls, reasoning cycles) within a task that complete successfully, regardless of final task outcome.

99%

Provides granular insight into where multi-step processes break down, useful for debugging complex agent workflows.

Context Window Saturation

Average percentage of the agent's available context (e.g., token limit) consumed per task or session.

< 80%

Prevents truncation of critical history or instructions. High saturation can lead to degraded performance or lost context.

Hallucination Rate (Tool Use)

Percentage of tool calls made with parameters that are invalid, non-existent, or semantically incorrect based on the tool's specification.

< 1%

Indicates the agent's accuracy in interpreting instructions and grounding its actions in reality. High rates waste resources and cause errors.

Cost per Successful Task

Average computational cost (e.g., LLM token cost, API call cost) attributed to tasks that reached a successful, validated conclusion.

Target varies by business case

Essential for economic viability. Links agent performance directly to operational expenditure (FinOps).

Retry Rate

Percentage of tool calls that required one or more automatic retries before succeeding or finally failing.

< 5%

High retry rates signal flaky dependencies or poorly configured timeouts/backoff, increasing latency and resource consumption.

TOOL CALL INSTRUMENTATION

Frequently Asked Questions

A Service Level Indicator (SLI) is a core metric for quantifying the reliability of external tool and API calls from an autonomous agent's perspective. These FAQs define SLIs, their role in observability, and how to implement them for agentic systems.

A Service Level Indicator (SLI) is a quantitative, user-centric measure of a specific aspect of a service's performance or reliability. In the context of agentic observability, an SLI measures the behavior of external tool and API calls from the agent's perspective, such as latency, success rate, or availability. It is the raw measurement used to define reliability targets.

For example, a foundational SLI for tool calling is Tool Call Success Rate, calculated as (Successful Tool Calls / Total Tool Calls) * 100. This directly measures how often an agent's attempts to use an external service succeed.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.