Inferensys

Glossary

Composite SLI

A Composite SLI is a Service Level Indicator derived from mathematically combining two or more underlying Agentic SLIs to provide a unified score for complex agent performance.
Developer reviewing multi-agent chat interface on laptop, agent conversation logs visible, casual coding session at WeWork desk.
AGENTIC OBSERVABILITY AND TELEMETRY

What is Composite SLI?

A Composite SLI is a unified Service Level Indicator derived from combining multiple underlying Agentic SLIs to measure complex aspects of autonomous agent performance.

A Composite SLI is a Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs, providing a unified score for a complex aspect of agent performance, such as overall efficiency or safety. It synthesizes granular metrics—like Planning Success Rate, End-to-End Task Latency, and Cost Per Successful Task—into a single, holistic indicator. This enables engineering leaders to assess high-level system health and trade-offs without monitoring a dashboard of disparate signals.

Common formulas include weighted averages or more sophisticated functions that model interactions between SLIs, such as efficiency (tasks/cost) or a Resiliency Score. Defining a Composite SLI is a key step in Agentic SLO definition, as it creates a target for complex, business-aligned outcomes rather than isolated technical behaviors. It is a cornerstone of Evaluation-Driven Development for autonomous systems, providing a quantifiable benchmark for performance and guiding optimization efforts.

AGENTIC OBSERVABILITY

Key Characteristics of a Composite SLI

A Composite SLI is a unified Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs. It provides a holistic score for complex aspects of agent performance, such as overall efficiency, safety, or reliability.

01

Mathematical Aggregation

A Composite SLI is not a simple average but a weighted or formulaic combination of constituent SLIs. Common aggregation functions include:

  • Weighted sums (e.g., 0.6 * Planning Success Rate + 0.4 * Action Success Ratio)
  • Minimum or maximum functions to track the worst/best-performing component
  • Geometric means for rate-based metrics
  • Conditional logic (e.g., a composite fails if any critical guardrail SLI is breached). This allows engineering teams to define a single score that reflects the multi-dimensional nature of agent health.
02

Holistic Performance Scoring

Its primary purpose is to provide a unified view of a complex capability by combining related but distinct operational signals. For example:

  • An Overall Efficiency Score could combine Task Completion Rate, End-to-End Task Latency, and Redundant Action Ratio.
  • A Safety & Compliance Score might aggregate Guardrail Compliance Rate, Hallucination Rate, and Fallback Success Rate.
  • A Resiliency Score could be derived from Self-Correction Success Rate, Retry Success Rate, and Health Check Success Rate. This moves monitoring from isolated metrics to actionable, business-aligned scores.
03

Derived from Atomic SLIs

A Composite SLI is built upon well-defined, atomic Agentic SLIs. It does not measure raw telemetry but combines higher-level indicators. Prerequisites include:

  • Established baselines for each constituent SLI (e.g., Planning Success Rate, Action Success Ratio).
  • Clear understanding of the relationships and trade-offs between the underlying metrics.
  • Reliable data pipelines for each component SLI. This derivation ensures the composite is traceable and debuggable; a drop in the composite score can be investigated by drilling into its atomic components.
04

SLO Target Definition

Composite SLIs enable the definition of sophisticated Service Level Objectives (SLOs) for entire agent capabilities. Instead of managing dozens of individual SLOs, teams can set a target for the composite. For instance:

  • "Our agent's Safety & Compliance Score must be ≥ 99.5% over a 30-day window."
  • "The Overall Efficiency Score for the procurement agent must not drop below 0.85." This simplifies error budget calculation and consumption tracking for complex systems, as there is one primary budget for the composite behavior.
05

Prioritization & Triage Signal

A shifting Composite SLI value serves as a high-level alert that a complex aspect of agent performance is degrading. It directs engineering attention before individual component SLIs breach their thresholds. The composite's structure informs triage:

  • A drop in an Efficiency Score immediately points to planning, execution, or latency sub-systems.
  • A decline in a Safety Score prioritizes checks on guardrails, fact-checking, and fallback mechanisms. This transforms observability from reactive monitoring to progressive system management.
06

Example: Agent Reliability Index

A practical example is an Agent Reliability Index for a customer support agent, defined as: (0.3 * Task Completion Rate) + (0.3 * Result Accuracy) + (0.2 * Guardrail Compliance Rate) + (0.2 * (1 - Normalized Latency))

  • Task Completion Rate: Ensures the agent finishes the job.
  • Result Accuracy: Ensures the answer is correct.
  • Guardrail Compliance Rate: Ensures the agent stays within policy.
  • Normalized Latency: Penalizes slow performance. This single index, tracked over time, gives a CTO a clear, quantitative measure of the agent's end-to-end service quality.
METRIC COMPARISON

Composite SLI vs. Related Metrics

A comparison of the Composite SLI with other key performance and observability metrics used in agentic systems, highlighting its distinct purpose and construction.

Metric / FeatureComposite SLIAgentic SLIBusiness KPIAutomated Evaluation Score

Primary Purpose

Unified score for a complex, multi-faceted aspect of agent performance (e.g., overall efficiency, safety).

Quantitative measure of a single, specific performance dimension (e.g., latency, success rate).

High-level business outcome metric measuring value delivery (e.g., user satisfaction, cost savings).

Programmatic assessment of a single agent output's quality (e.g., correctness, completeness).

Construction Method

Mathematical combination (e.g., weighted average, harmonic mean) of two or more underlying Agentic SLIs.

Direct measurement from system telemetry (e.g., timer for latency, counter for successes/failures).

Often derived from business data, sometimes informed by trends in underlying SLIs.

Generated by a rule-based or model-based evaluator analyzing an agent's output against criteria.

Granularity

Aggregate, holistic view of system behavior over a time window.

Specific, atomic view of a single operational characteristic.

Broad, strategic view of system impact.

Per-task or per-output evaluation.

Example Metrics Combined

Efficiency Score = f(End-to-End Task Latency, Redundant Action Ratio, Cost Per Successful Task).

Planning Success Rate (95%), End-to-End Task Latency (< 2 sec).

Agent-Driven Operational Cost Reduction (15%), Customer Resolution Rate.

Factual Consistency Score (0.92), Instruction Adherence Score (0.87).

Used For SLO Definition?

Triggers Alerts Directly?

Indicates Root Cause?

Primary Audience

Engineering Leaders, CTOs (system health overview).

SREs, DevOps Engineers (operational debugging).

Business Stakeholders, CTOs (value reporting).

ML Engineers, QA (output validation).

COMPOSITE SLI

Frequently Asked Questions

A Composite SLI is a unified Service Level Indicator derived from multiple underlying metrics, providing a holistic score for complex aspects of autonomous agent performance, such as overall efficiency or safety.

A Composite SLI is a Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs, providing a unified score for a complex aspect of agent performance, such as overall efficiency or safety. Unlike a single SLI that measures a discrete metric like latency or success rate, a Composite SLI synthesizes multiple dimensions into a single, actionable value. For example, an "Agent Efficiency Score" might combine Task Completion Rate, End-to-End Task Latency, and Cost Per Successful Task using a weighted formula. This allows engineering leaders and CTOs to track high-level system health without monitoring a dashboard of dozens of individual metrics, simplifying operational oversight and aligning technical performance with business objectives.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.