Glossary

OpenTelemetry (OTel)

OpenTelemetry (OTel) is a vendor-neutral, open-source observability framework for generating, collecting, and exporting telemetry data—including traces, metrics, and logs—from software applications.

Get in touch Learn more

Large-scale analytics wall displaying performance trends and system relationships.

OBSERVABILITY FRAMEWORK

What is OpenTelemetry (OTel)?

OpenTelemetry is the open-source standard for instrumenting software to generate, collect, and export telemetry data.

OpenTelemetry (OTel) is a vendor-neutral, open-source observability framework for generating, collecting, and exporting telemetry data—including traces, metrics, and logs—from software applications. It provides a single, unified set of APIs, SDKs, and tools to instrument code, enabling developers to understand system performance and behavior without being locked into a specific commercial vendor's ecosystem. This data is essential for distributed tracing and performance monitoring in complex systems like LLM application stacks.

In the context of LLM Performance Monitoring, OTel is used to instrument model inference endpoints, track latency percentiles (P90, P99), measure Tokens per Second (TPS), and create end-to-end traces of user requests as they flow through retrieval, generation, and post-processing services. By exporting this standardized telemetry to backends like Prometheus for metrics or dedicated tracing systems, engineering teams gain the visibility needed to enforce Service Level Objectives (SLOs), perform root cause analysis (RCA), and optimize resource utilization.

OBSERVABILITY FRAMEWORK

Key Features of OpenTelemetry

OpenTelemetry (OTel) is a vendor-neutral, open-source framework for generating, collecting, and exporting telemetry data—traces, metrics, and logs—from software applications and their infrastructure.

Unified Signals: Traces, Metrics, and Logs

OpenTelemetry provides a single, integrated framework for the three primary telemetry signals. A trace records the path of a request through services. Metrics are numerical measurements of system performance over time. Logs are timestamped records of discrete events. OTel's APIs and SDKs allow you to instrument your LLM application once to emit all three, ensuring correlated data and eliminating the need for multiple, disparate instrumentation libraries.

Vendor-Neutral Data Collection

A core tenet of OpenTelemetry is decoupling instrumentation from analysis. You instrument your code using OTel's APIs, which generate data in a standard format. This data is sent to the OTel Collector, a vendor-agnostic proxy. The Collector can then process, filter, batch, and export this data to any backend of your choice (e.g., Prometheus for metrics, Jaeger for traces, Datadog, Splunk). This prevents vendor lock-in and allows you to change your observability backend without rewriting your application code.

The OpenTelemetry Collector

The OTel Collector is a critical, standalone service for managing telemetry data flow. It operates in a pipeline architecture with three core components:

Receivers: How data gets in (e.g., OTLP, Jaeger, Prometheus, syslog).
Processors: What happens to the data in transit (e.g., batch, filter, enrich with attributes, sample traces).
Exporters: Where data gets sent (e.g., to Jaeger, Prometheus, or commercial vendors). This architecture centralizes configuration, reduces overhead on your application, and enables powerful data transformation before it reaches expensive storage backends.

Context Propagation and Distributed Tracing

For monitoring LLM applications spanning multiple services (e.g., API gateway, LLM orchestrator, vector database), OpenTelemetry's distributed tracing is essential. It automatically injects trace context (trace and span IDs) into requests. This context is propagated across network calls (via HTTP headers, gRPC metadata, etc.), allowing the OTel system to reconstruct the complete journey of a single user request. You can see the exact latency contribution of each service, database call, or external API, which is critical for diagnosing high inter-token latency or failures in complex chains.

Auto-Instrumentation

OpenTelemetry significantly reduces the manual effort of code instrumentation through auto-instrumentation libraries. For popular frameworks (e.g., Express.js, Django, Spring Boot, OpenAI's Python client) and infrastructure clients (e.g., PostgreSQL, Redis, HTTP libraries), OTel can automatically wrap critical functions to generate spans and metrics without requiring code changes. For LLM applications, this means you can quickly gain visibility into database calls for Retrieval-Augmented Generation (RAG) or external API calls for tool execution with minimal developer overhead.

Semantic Conventions

To ensure telemetry data is consistent and interoperable across different services and teams, OpenTelemetry defines Semantic Conventions. These are standardized naming schemas for common attributes (key-value pairs) attached to spans, metrics, and logs. For example, conventions exist for HTTP (http.method, http.status_code), database (db.system, db.statement), and compute resources. By adhering to these conventions, your LLM telemetry becomes self-describing and can be reliably queried and aggregated, forming a consistent foundation for dashboards and anomaly detection.

OBSERVABILITY FRAMEWORK

How OpenTelemetry Works

The framework operates through a standardized instrumentation layer. Developers integrate language-specific SDKs into their application code, which automatically generates telemetry signals—structured traces for request flows, metrics for numerical measurements, and logs for discrete events. This instrumentation is designed to be low-overhead and follows a unified data model, ensuring consistency across different programming languages and frameworks.

Generated telemetry is processed by the OpenTelemetry Collector, a vendor-agnostic service that receives, processes, and exports data. It can perform operations like batching, filtering, and redaction before routing signals to one or more backend observability platforms (e.g., Prometheus for metrics, Jaeger for traces, or commercial vendors). This decouples instrumentation from analysis, providing unparalleled flexibility in an organization's monitoring stack.

ADOPTION ECOSYSTEM

Who Uses OpenTelemetry?

OpenTelemetry's vendor-neutral, open-source framework for generating, collecting, and exporting telemetry data is adopted across the technology stack to provide unified observability.

Platform & DevOps Engineers

These engineers instrument the underlying infrastructure and platform services. They use OpenTelemetry to:

Auto-instrument common frameworks and libraries to collect traces and metrics without code changes.
Export data to backends like Prometheus for infrastructure monitoring and Grafana for dashboards.
Implement distributed tracing across microservices to diagnose latency issues and failed requests.
Correlate logs, metrics, and traces using the unified W3C Trace Context standard.

EXPLORE

Application & ML Engineers

Developers embedding LLMs and building AI-powered applications use OpenTelemetry for deep code-level visibility. Key practices include:

Manually instrumenting custom business logic and LLM API calls to create detailed spans.
Tracking key LLM performance metrics like Time to First Token (TTFT), Tokens per Second (TPS), and token usage.
Monitoring for output drift or hallucinations by capturing input prompts and generated completions within traces.
Integrating with evaluation frameworks to log quality scores and user feedback, creating a feedback loop for model improvement.

EXPLORE

Site Reliability Engineers (SREs)

SREs leverage OpenTelemetry data to ensure system reliability and meet business objectives. Their work involves:

Defining Service Level Indicators (SLIs) like latency and error rate from OTel metrics.
Setting Service Level Objectives (SLOs) and managing error budgets based on this telemetry.
Performing root cause analysis (RCA) by querying trace data to quickly isolate failing services.
Building automated anomaly detection on metric streams to alert on concept drift or performance degradation.

EXPLORE

Observability & Vendor Teams

Commercial observability vendors and internal platform teams build and integrate with OpenTelemetry as the standard data source.

Vendors (e.g., Datadog, New Relic, Dynatrace) accept OTel data, reducing vendor lock-in for their customers.
Internal teams create custom collectors, processors, and exporters to tailor data pipelines.
They develop semantic conventions to ensure consistency (e.g., llm.* attributes for LLM operations).
Provide curated Grafana dashboards and alerting rules based on OTel metric conventions.

EXPLORE

Cloud Providers & Managed Services

Major cloud platforms integrate OpenTelemetry into their managed services, providing native observability.

AWS (AWS Distro for OpenTelemetry), Google Cloud (Cloud Operations), and Microsoft Azure (Azure Monitor) offer managed OTel collectors.
Managed Kubernetes services (EKS, GKE, AKS) provide OTel-based monitoring for containers.
Database and message queue services emit OTel-compatible metrics and traces.
This enables a consistent observability strategy across hybrid and multi-cloud deployments.

EXPLORE

CI/CD & Deployment Orchestration

Teams use OpenTelemetry to observe the deployment pipeline and the behavior of new releases.

Instrumenting deployment tools to trace the rollout of new LLM model versions.
Enabling canary deployment analysis by comparing metrics (e.g., latency, error rate) between old and new versions.
Shadow deployments use OTel to log outputs from a new model version without serving them, for quality comparison.
Tracking Mean Time to Recovery (MTTR) by timing remediation actions within incident response traces.

EXPLORE

PROTOCOL COMPARISON

OpenTelemetry vs. Legacy Observability

A technical comparison of OpenTelemetry's unified framework against traditional, siloed observability approaches for LLM and AI application monitoring.

Observability Feature	OpenTelemetry (OTel)	Legacy / Vendor-Specific Agents
Data Model Standardization
Telemetry Signal Correlation
Vendor Lock-In Risk
Instrumentation Overhead	< 5%	5-15% (varies by agent)
Multi-Language Support	10+ official SDKs	Limited, vendor-dependent
Context Propagation	W3C TraceContext standard	Proprietary or limited
Data Export Control	Configurable processors & exporters	Vendor-defined pipeline
LLM-Specific Semantic Conventions	Emerging standard (e.g., gen.ai.*)	Non-existent or proprietary

OPEN TELEMETRY

Frequently Asked Questions

OpenTelemetry (OTel) is the open-source, vendor-neutral standard for generating, collecting, and exporting telemetry data—traces, metrics, and logs—from software systems. For LLM applications, it provides the foundational observability layer to monitor performance, debug issues, and understand system behavior.

OpenTelemetry (OTel) is a collection of APIs, SDKs, and tools designed to create and manage telemetry data—traces, metrics, and logs—from applications. It works by instrumenting your code to generate this data, which is then collected, processed, and exported to observability backends like Prometheus, Jaeger, or commercial vendors.

For an LLM application, the workflow is:

Instrumentation: The OTel SDK is integrated into your application code (e.g., FastAPI server, model inference logic).
Data Generation: As requests flow through, the SDK creates spans (representing operations like llm.generate or embedding.retrieve), records metrics (like tokens_per_second), and captures structured logs.
Context Propagation: A unique trace ID is passed through all services (e.g., from API gateway to model server to vector database), linking all related spans into a single distributed trace.
Export: The collected telemetry is batched and sent to one or more configured backends for storage and analysis.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

LLM PERFORMANCE MONITORING

Related Terms

OpenTelemetry (OTel) is a foundational component of the observability stack. These related concepts define the specific metrics, systems, and practices used to monitor and ensure the reliability of LLM-powered applications.

Distributed Tracing

A method for observing requests as they propagate through a distributed system of microservices, such as an LLM application stack. It records timing and metadata for individual operations (called spans) across service boundaries, providing a complete view of a transaction's lifecycle.

Spans represent a single operation within a trace.
Traces show the end-to-end journey of a request.
Context Propagation ensures trace identifiers are passed between services.

OpenTelemetry is the primary standard for implementing distributed tracing, generating traces that include LLM inference calls, database queries, and external API calls.

EXPLORE

Service Level Objective (SLO)

A target value or range for a Service Level Indicator (SLI) that defines the acceptable performance and reliability of an LLM service. SLOs are business-aligned contracts, such as "99% of requests must have a Time to First Token under 500ms."

Derived from user experience and business requirements.
Used to calculate an Error Budget—the allowable amount of unreliability.
Governs the pace of deployments and operational risk-taking.

Monitoring SLO compliance requires precise metrics collected by systems like OpenTelemetry to track latency, throughput, and error rates.

EXPLORE

Time to First Token (TTFT)

A critical latency metric measuring the duration from when a request is sent to an LLM until the first token of the response is received by the client. TTFT primarily reflects the computational cost of the prefill stage, where the model processes the entire input prompt.

Dominated by prompt length and model size.
Key for user-perceived responsiveness in chat applications.
Measured as a percentile (e.g., P95 TTFT) using telemetry data.

OpenTelemetry spans can instrument the client-side wait to capture this user-facing metric.

Golden Dataset

A curated, high-quality set of input-output pairs used as a reference standard for evaluating LLM performance. It serves as a ground truth for detecting regressions, output drift, and monitoring overall quality in production.

Used in automated testing pipelines and canary deployments.
Enables comparison of metrics like accuracy, latency, and embedding similarity across model versions.
Telemetry systems log model outputs, which can be compared against golden dataset expectations to trigger alerts.

Statistical Process Control (SPC)

A method of quality control using statistical tools, like control charts, to monitor and control a process. In LLM operations, SPC is applied to telemetry metrics (e.g., latency, token rate) to detect anomalies and ensure stable, predictable model behavior.

Establishes a baseline mean and control limits for a metric.
Flags data points that fall outside expected statistical variation.
Essential for distinguishing normal fluctuation from genuine incidents requiring intervention.

Metrics exported by OpenTelemetry to Prometheus are prime inputs for SPC dashboards.

Canary Deployment

A release strategy where a new version of an LLM model or application is deployed to a small subset of production traffic. Its performance and behavior are monitored and compared against the baseline version before a full rollout.

Traffic Splitting: Routes a percentage of requests (e.g., 5%) to the canary.
Comparative Analysis: Uses telemetry to compare key SLIs (latency, error rate) between canary and baseline.
Automated Rollback: Triggered if the canary violates predefined error budgets.

OpenTelemetry traces and metrics are crucial for attributing performance data to specific deployment versions.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

OpenTelemetry (OTel)

What is OpenTelemetry (OTel)?

Key Features of OpenTelemetry

Unified Signals: Traces, Metrics, and Logs

Vendor-Neutral Data Collection

The OpenTelemetry Collector

Context Propagation and Distributed Tracing

Auto-Instrumentation

Semantic Conventions

How OpenTelemetry Works

Who Uses OpenTelemetry?

Platform & DevOps Engineers

Application & ML Engineers

Site Reliability Engineers (SREs)

Observability & Vendor Teams

Cloud Providers & Managed Services

CI/CD & Deployment Orchestration

OpenTelemetry vs. Legacy Observability

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Distributed Tracing

Service Level Objective (SLO)

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there