Inferensys

Glossary

Service Level Indicator (SLI)

A Service Level Indicator (SLI) is a quantitative measure of a specific aspect of a service's performance, such as request latency, error rate, or throughput, used to calculate compliance with a Service Level Objective (SLO).
Performance engineer optimizing AI latency on laptop, latency charts visible, technical optimization session.
PRODUCTION CANARY ANALYSIS

What is a Service Level Indicator (SLI)?

A Service Level Indicator (SLI) is a quantitative measure of a specific aspect of a service's performance, such as request latency, error rate, or throughput, used to calculate compliance with a Service Level Objective (SLO).

An SLI is a direct, measurable signal of a service's health from the user's perspective. Common examples include the proportion of successful HTTP requests (availability), the time taken to serve a request (latency), or the rate of valid outputs from an AI model (quality). In production canary analysis, SLIs are the primary metrics compared between the stable control group and the new canary deployment to detect performance regressions before a full rollout.

Defining precise SLIs is foundational to Evaluation-Driven Development. For AI services, SLIs extend beyond infrastructure to include model-specific metrics like inference latency, prediction accuracy, or hallucination rate. These indicators feed into Automated Canary Analysis (ACA) systems, which statistically evaluate SLI differences to generate a deployment verdict, ensuring releases meet predefined Service Level Objectives (SLOs) without degrading the user experience.

DEFINITION

Key Characteristics of an SLI

A Service Level Indicator (SLI) is a quantitative measure of a specific aspect of a service's performance, such as request latency, error rate, or throughput, used to calculate compliance with a Service Level Objective (SLO).

01

Quantitative and Measurable

An SLI must be a quantifiable metric derived from observable system data, not a subjective opinion. It is calculated from raw telemetry like request counts, error logs, or latency measurements. Examples include:

  • Request latency: The time taken to successfully process a request (e.g., 95th percentile latency < 200ms).
  • Error rate: The proportion of requests that result in a failure (e.g., (failed requests / total requests) * 100).
  • Throughput: The number of requests a system handles per second.
  • Availability: The proportion of time a service is operational and responding.
02

Directly Tied to User Experience

Effective SLIs measure aspects of the service that end-users directly perceive as quality. They should answer the question: "What does a good experience look like for our users?"

  • User-facing latency is a better SLI than internal CPU utilization.
  • HTTP 5xx error rate is more relevant than low-level disk I/O errors, unless those errors cause user-visible failures.
  • The definition of a 'successful' request must align with the user's goal (e.g., a search returning relevant results, not just a 200 OK).
03

Defined Over a Specific Aggregation

An SLI is not a single data point but a statistical aggregation over a defined time window and request population. This prevents noise from triggering unnecessary alerts.

  • Time Window: SLIs are evaluated over periods like 1 minute, 5 minutes, or 28 days (rolling).
  • Aggregation Method: Common methods include:
    • Ratio: (Good events / Total eligible events) over the window.
    • Distribution: Percentiles (p50, p95, p99) of a measurement like latency.
    • Threshold: Percentage of time a metric is below/above a target.
  • Example: "The proportion of HTTP requests that succeeded over the last 5 minutes."
04

Aligned with a Service Level Objective (SLO)

An SLI is meaningless without a target threshold defined in an SLO. The SLO sets the acceptable performance level for the SLI.

  • SLI: The measurement itself (e.g., error rate calculated as 0.5%).
  • SLO: The target for that measurement (e.g., error rate ≤ 0.1%).
  • The error budget is then derived from this pairing: it's the allowable deviation from the SLO (e.g., 0.4% of requests can fail before the budget is exhausted). This creates a clear, data-driven framework for deciding when to halt deployments or prioritize reliability work.
05

Implementation via Reliable Telemetry

SLIs must be computed from high-fidelity, production-grade observability data. The measurement system must be more reliable than the service it monitors.

  • Data Sources: Application logs, structured metrics from exporters (Prometheus), distributed traces (OpenTelemetry), or load balancer access logs.
  • Instrumentation Points: SLIs should be measured as close to the user as possible, often at the service entry point (e.g., API gateway, load balancer).
  • Avoiding Bias: The measurement must cover all relevant traffic. Sampling can introduce bias and invalidate SLI calculations for low-volume services.
06

SLI Examples in AI/ML Services

For AI-powered services, SLIs must capture both infrastructure health and model quality. Key examples include:

  • Inference Latency: p95 latency for model prediction requests.
  • Model Throughput: Predictions per second the endpoint can handle.
  • Inference Error Rate: Percentage of prediction requests returning a 5xx error or a system-level failure.
  • Model Quality Drift: Percentage of predictions where confidence scores diverge significantly from a baseline, indicating potential performance degradation.
  • Hallucination Rate (for LLMs): Proportion of generated outputs flagged as factually incorrect or unsupported by the provided context.
  • Data Freshness: Age of the most recent data used for a prediction in a real-time system.
EVALUATION-DRIVEN DEVELOPMENT

How to Define and Implement an SLI

A Service Level Indicator (SLI) is the foundational, quantitative measurement for evaluating an AI service's performance against its reliability targets.

A Service Level Indicator (SLI) is a quantitative measure of a specific aspect of a service's performance, such as request latency, error rate, or throughput, used to calculate compliance with a Service Level Objective (SLO). In AI systems, SLIs extend beyond infrastructure to measure model-specific quality, including prediction accuracy, inference latency, and hallucination rates. Defining a precise SLI involves selecting a measurable event, a method of aggregation (e.g., a percentile or average), and a relevant time window for evaluation.

Implementation requires instrumenting the service to emit the raw data for the chosen metric, often via telemetry systems like Prometheus or OpenTelemetry. This data is then aggregated and compared against the SLO target to calculate an error budget. For AI canary deployments, SLIs are critical for Automated Canary Analysis (ACA), where metrics from the new version are statistically compared to a baseline to generate a deployment verdict. Effective SLIs are direct, representative of user experience, and aligned with business objectives.

SERVICE LEVEL HIERARCHY

SLI vs. SLO vs. SLA: A Comparison

A comparison of the three core components of service reliability management, detailing their purpose, format, and audience within the context of AI/ML service deployment.

FeatureService Level Indicator (SLI)Service Level Objective (SLO)Service Level Agreement (SLA)

Core Definition

A quantitative measure of a specific aspect of service performance.

A target value or range for an SLI over a specific period.

A formal contract defining the consequences of failing to meet SLOs.

Primary Role

Measurement. The raw, observed metric.

Internal Goal. The target for the measured metric.

External Promise. The business commitment with penalties.

Format & Granularity

A precise metric (e.g., p99 latency = 225ms, error rate = 0.15%).

A target threshold (e.g., p99 latency < 250ms, error rate < 0.3%).

A legal document with financial/credit penalties (e.g., 99.9% uptime SLO, with service credits for breach).

Audience & Purpose

Engineering & SRE teams. Used for monitoring, debugging, and calculating SLO compliance.

Internal product & engineering teams. Defines the reliability target for development and operations.

External customers or business stakeholders. Defines the business risk and liability of service unreliability.

Example in AI/ML Context

Model inference latency measured at the 99th percentile. Token generation throughput. Hallucination rate detected by a validator model.

p99 model inference latency < 300ms for 95% of days in a quarter. Hallucination rate < 2%.

If the quarterly SLO for p99 latency is not met, the customer receives a 10% service credit. Defines the support response time for model downtime.

Relationship

The measured input. Feeds the SLO calculation.

The goal set for the SLI. Defines the error budget.

The business wrapper that incorporates SLOs and defines remedies.

Change Frequency

High. Metrics can be added or refined as the service evolves.

Medium. Reviewed and adjusted quarterly based on error budget consumption and business needs.

Low. Legally binding; changes require contract renegotiation.

Key Action Trigger

Alerting when a metric deviates from normal behavior.

Error budget burn rate alerts. Triggers a focus on reliability work.

Breach triggers contractual penalties (e.g., service credits, termination rights).

PRODUCTION CANARY ANALYSIS

Frequently Asked Questions

Service Level Indicators (SLIs) are the foundational metrics used to quantitatively evaluate the health and performance of AI services during controlled deployments like canary releases. These questions address their definition, implementation, and role in modern MLOps.

A Service Level Indicator (SLI) is a quantitative, directly measurable metric that quantifies a specific aspect of a service's performance or reliability from the user's perspective. It is the raw measurement used to calculate compliance with a Service Level Objective (SLO). For AI services, common SLIs include:

  • Request Latency: The time from when a user sends a request to when they receive a complete response, often measured as a percentile (e.g., p95, p99).
  • Error Rate: The proportion of requests that result in a failure, such as a 5xx HTTP status code, a model inference error, or a failed validation check.
  • Throughput: The number of successful requests the service handles per second.
  • Model Quality Metrics: For AI/ML services, this can include metrics like prediction accuracy, hallucination rate for generative models, or business Key Performance Indicators (KPIs) derived from model outputs.

An SLI must be well-defined, consistently measurable, and representative of the user experience. It serves as the foundational data point for all reliability engineering.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.