A Composite SLI is a Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs, providing a unified score for a complex aspect of agent performance, such as overall efficiency or safety. It synthesizes granular metrics—like Planning Success Rate, End-to-End Task Latency, and Cost Per Successful Task—into a single, holistic indicator. This enables engineering leaders to assess high-level system health and trade-offs without monitoring a dashboard of disparate signals.
Glossary
Composite SLI

What is Composite SLI?
A Composite SLI is a unified Service Level Indicator derived from combining multiple underlying Agentic SLIs to measure complex aspects of autonomous agent performance.
Common formulas include weighted averages or more sophisticated functions that model interactions between SLIs, such as efficiency (tasks/cost) or a Resiliency Score. Defining a Composite SLI is a key step in Agentic SLO definition, as it creates a target for complex, business-aligned outcomes rather than isolated technical behaviors. It is a cornerstone of Evaluation-Driven Development for autonomous systems, providing a quantifiable benchmark for performance and guiding optimization efforts.
Key Characteristics of a Composite SLI
A Composite SLI is a unified Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs. It provides a holistic score for complex aspects of agent performance, such as overall efficiency, safety, or reliability.
Mathematical Aggregation
A Composite SLI is not a simple average but a weighted or formulaic combination of constituent SLIs. Common aggregation functions include:
- Weighted sums (e.g., 0.6 * Planning Success Rate + 0.4 * Action Success Ratio)
- Minimum or maximum functions to track the worst/best-performing component
- Geometric means for rate-based metrics
- Conditional logic (e.g., a composite fails if any critical guardrail SLI is breached). This allows engineering teams to define a single score that reflects the multi-dimensional nature of agent health.
Holistic Performance Scoring
Its primary purpose is to provide a unified view of a complex capability by combining related but distinct operational signals. For example:
- An Overall Efficiency Score could combine Task Completion Rate, End-to-End Task Latency, and Redundant Action Ratio.
- A Safety & Compliance Score might aggregate Guardrail Compliance Rate, Hallucination Rate, and Fallback Success Rate.
- A Resiliency Score could be derived from Self-Correction Success Rate, Retry Success Rate, and Health Check Success Rate. This moves monitoring from isolated metrics to actionable, business-aligned scores.
Derived from Atomic SLIs
A Composite SLI is built upon well-defined, atomic Agentic SLIs. It does not measure raw telemetry but combines higher-level indicators. Prerequisites include:
- Established baselines for each constituent SLI (e.g., Planning Success Rate, Action Success Ratio).
- Clear understanding of the relationships and trade-offs between the underlying metrics.
- Reliable data pipelines for each component SLI. This derivation ensures the composite is traceable and debuggable; a drop in the composite score can be investigated by drilling into its atomic components.
SLO Target Definition
Composite SLIs enable the definition of sophisticated Service Level Objectives (SLOs) for entire agent capabilities. Instead of managing dozens of individual SLOs, teams can set a target for the composite. For instance:
- "Our agent's Safety & Compliance Score must be ≥ 99.5% over a 30-day window."
- "The Overall Efficiency Score for the procurement agent must not drop below 0.85." This simplifies error budget calculation and consumption tracking for complex systems, as there is one primary budget for the composite behavior.
Prioritization & Triage Signal
A shifting Composite SLI value serves as a high-level alert that a complex aspect of agent performance is degrading. It directs engineering attention before individual component SLIs breach their thresholds. The composite's structure informs triage:
- A drop in an Efficiency Score immediately points to planning, execution, or latency sub-systems.
- A decline in a Safety Score prioritizes checks on guardrails, fact-checking, and fallback mechanisms. This transforms observability from reactive monitoring to progressive system management.
Example: Agent Reliability Index
A practical example is an Agent Reliability Index for a customer support agent, defined as:
(0.3 * Task Completion Rate) + (0.3 * Result Accuracy) + (0.2 * Guardrail Compliance Rate) + (0.2 * (1 - Normalized Latency))
- Task Completion Rate: Ensures the agent finishes the job.
- Result Accuracy: Ensures the answer is correct.
- Guardrail Compliance Rate: Ensures the agent stays within policy.
- Normalized Latency: Penalizes slow performance. This single index, tracked over time, gives a CTO a clear, quantitative measure of the agent's end-to-end service quality.
Composite SLI vs. Related Metrics
A comparison of the Composite SLI with other key performance and observability metrics used in agentic systems, highlighting its distinct purpose and construction.
| Metric / Feature | Composite SLI | Agentic SLI | Business KPI | Automated Evaluation Score |
|---|---|---|---|---|
Primary Purpose | Unified score for a complex, multi-faceted aspect of agent performance (e.g., overall efficiency, safety). | Quantitative measure of a single, specific performance dimension (e.g., latency, success rate). | High-level business outcome metric measuring value delivery (e.g., user satisfaction, cost savings). | Programmatic assessment of a single agent output's quality (e.g., correctness, completeness). |
Construction Method | Mathematical combination (e.g., weighted average, harmonic mean) of two or more underlying Agentic SLIs. | Direct measurement from system telemetry (e.g., timer for latency, counter for successes/failures). | Often derived from business data, sometimes informed by trends in underlying SLIs. | Generated by a rule-based or model-based evaluator analyzing an agent's output against criteria. |
Granularity | Aggregate, holistic view of system behavior over a time window. | Specific, atomic view of a single operational characteristic. | Broad, strategic view of system impact. | Per-task or per-output evaluation. |
Example Metrics Combined | Efficiency Score = f(End-to-End Task Latency, Redundant Action Ratio, Cost Per Successful Task). | Planning Success Rate (95%), End-to-End Task Latency (< 2 sec). | Agent-Driven Operational Cost Reduction (15%), Customer Resolution Rate. | Factual Consistency Score (0.92), Instruction Adherence Score (0.87). |
Used For SLO Definition? | ||||
Triggers Alerts Directly? | ||||
Indicates Root Cause? | ||||
Primary Audience | Engineering Leaders, CTOs (system health overview). | SREs, DevOps Engineers (operational debugging). | Business Stakeholders, CTOs (value reporting). | ML Engineers, QA (output validation). |
Frequently Asked Questions
A Composite SLI is a unified Service Level Indicator derived from multiple underlying metrics, providing a holistic score for complex aspects of autonomous agent performance, such as overall efficiency or safety.
A Composite SLI is a Service Level Indicator derived from the mathematical combination of two or more underlying Agentic SLIs, providing a unified score for a complex aspect of agent performance, such as overall efficiency or safety. Unlike a single SLI that measures a discrete metric like latency or success rate, a Composite SLI synthesizes multiple dimensions into a single, actionable value. For example, an "Agent Efficiency Score" might combine Task Completion Rate, End-to-End Task Latency, and Cost Per Successful Task using a weighted formula. This allows engineering leaders and CTOs to track high-level system health without monitoring a dashboard of dozens of individual metrics, simplifying operational oversight and aligning technical performance with business objectives.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
A Composite SLI is built from foundational metrics. These related terms define the specific performance indicators and operational concepts used to measure and manage autonomous agents.
Agentic SLI (Service Level Indicator)
An Agentic SLI is a quantitative measure of a specific aspect of an autonomous agent's performance. It is the fundamental building block for observability.
- Examples: Planning Success Rate, End-to-End Task Latency, Action Success Ratio.
- Purpose: Provides a precise, numerical signal of health for a single dimension of agent behavior.
- Usage: Tracked in real-time and aggregated over time to compute SLO compliance.
Agentic SLO (Service Level Objective)
An Agentic SLO is a target value or range for an Agentic SLI, defining the acceptable level of performance over a specified period.
- Structure: Often expressed as a percentage (e.g., "Planning Success Rate ≥ 99.5% over 30 days").
- Function: Creates a formal reliability contract for the agent system.
- Error Budget: The allowable deviation from the SLO, calculated as
(100% - SLO%) * time_window. This budget manages the trade-off between innovation velocity and system stability.
Error Budget
An Error Budget is the allowable amount of time an autonomous agent system can fail to meet its SLOs within a defined compliance period.
- Calculation: Derived directly from the SLO. For a 99.9% monthly SLO, the error budget is 0.1% of the month (~43.2 minutes).
- Operational Use: Serves as a shared resource between development and operations teams. Consuming the budget triggers a focus on reliability; preserving it allows for riskier feature deployments.
- Burn Rate: Measures how quickly the error budget is being consumed, a key signal for prioritizing incident response.
Planning Success Rate
Planning Success Rate is a core Agentic SLI that measures the percentage of times an agent successfully decomposes a high-level goal into a valid, executable sequence of sub-tasks.
- Measurement Point: Evaluated after the agent's initial reasoning/planning phase, before execution begins.
- Importance: A low rate indicates fundamental flaws in the agent's reasoning capabilities or understanding of its action space.
- Example: An agent with a 98% Planning Success Rate fails to create a coherent plan for 2 out of every 100 tasks received.
End-to-End Task Latency
End-to-End Task Latency is an Agentic SLI that measures the total clock time from when an agent receives a task to when it delivers a final, validated result.
- Scope: Includes all phases: planning, tool execution, waiting for external APIs, reflection, and final output generation.
- Key Consideration: For composite tasks, this is the primary user-facing latency metric, more critical than the latency of individual sub-steps.
- SLO Example: "95% of agent tasks complete within 10 seconds."
Self-Correction Success Rate
Self-Correction Success Rate is an Agentic SLI that measures the effectiveness of an agent's recursive error correction loops in identifying and remediating its own failures without human intervention.
- Mechanism: Tracks attempts where an agent, upon detecting a failure (e.g., a tool error, an invalid result), successfully adjusts its plan and retries to achieve the goal.
- Value: A high rate indicates robust resilience and reduces operational burden. It is a key component of a Resiliency Score.
- Failure Mode: A low rate may necessitate more aggressive fallback mechanisms or human-in-the-loop escalation.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us