A trace is a collection of spans that represents the complete end-to-end path of a single request as it propagates through a distributed system, forming a directed acyclic graph (DAG) of operations. In agentic systems, a trace captures the entire execution journey, from the initial user prompt or trigger, through internal reasoning steps and tool calls, to the final action or response. It is uniquely identified by a Trace ID that correlates all work across service and process boundaries.
Glossary
Trace

What is a Trace?
A trace is the fundamental data structure for observing request flow in distributed systems, particularly critical for monitoring autonomous agents.
Traces are composed of hierarchically nested spans, where parent-child relationships define the flow of execution and causality. This structure allows engineers to reconstruct the precise sequence of events, identify performance bottlenecks (latency), and diagnose failures in complex, multi-service workflows. For autonomous agents, traces are essential for auditing behavior, ensuring deterministic execution, and understanding the agent's decision-making process by providing a complete, time-ordered record of its internal state changes and external interactions.
Key Components of a Trace
A trace is a directed acyclic graph (DAG) composed of interconnected spans. Understanding its core components is essential for analyzing system performance and diagnosing failures in distributed architectures.
The Span
A span is the fundamental building block of a trace, representing a single, named, and timed operation within a service. It captures the execution of a contiguous unit of work, such as:
- A function call or method execution
- An HTTP request to an external API
- A database query or transaction
Each span contains a start time, duration, status code (e.g., OK, ERROR), and a set of attributes for metadata. Spans are linked via parent-child relationships to form the trace's hierarchical structure.
Trace & Span Identifiers
Unique identifiers are crucial for correlating data across a distributed system.
- Trace ID: A globally unique, immutable identifier (typically a 16-byte array) assigned to the entire request. Every span in the same trace shares this ID, enabling end-to-end correlation.
- Span ID: A unique identifier for an individual span within a trace. It is used to establish the parent-child links between spans.
- Parent Span ID: The identifier of the span that directly caused the current span's work. A span without a parent ID is a root span, representing the initial operation of the trace.
Span Context & Propagation
Span context is the immutable tracing state that must be propagated across process and service boundaries to maintain trace continuity. It contains the trace ID, span ID, trace flags (e.g., sampling decision), and trace state (vendor-specific data).
Distributed context propagation is the mechanism for passing this context, typically via:
- HTTP headers (using standards like W3C Trace Context or B3 Propagation)
- Messaging system metadata (e.g., Kafka headers, gRPC metadata)
A propagator component in the tracing SDK handles the injection (outbound) and extraction (inbound) of this context.
Span Attributes & Events
These components add rich, queryable metadata to a span.
- Attributes: Key-value pairs that describe the operation. Examples include
http.method="GET",db.statement="SELECT * FROM users", or custom business data likeuser.id="12345". - Events: Timed, structured logs attached to a span that represent singular occurrences during its lifetime, such as an exception being thrown, a cache miss, or a significant state change. Each event has a name, timestamp, and its own set of attributes.
Span Links & Span Kind
These elements define semantic relationships and roles.
- Span Links: A reference from one span to a span in a different trace. They model causal relationships that are not parent-child, such as a batch job processing an item that originated from an asynchronous queue.
- Span Kind: A classification specifying the span's role in the trace topology. Core kinds include:
- Server: For the receiver of a remote operation (e.g., an HTTP server handler).
- Client: For the initiator of a remote operation (e.g., an outgoing HTTP call).
- Internal: For operations within the application boundary with no remote context.
- Producer/Consumer: For messaging systems.
Trace as a Directed Acyclic Graph (DAG)
A complete trace is a collection of spans that forms a Directed Acyclic Graph (DAG), not merely a linear chain. This structure emerges because:
- A single parent span can have multiple concurrent child spans (e.g., fan-out API calls).
- Span links can create edges between spans in different traces.
- The DAG structure is visualized in tools via flame graphs (showing nested duration) and service graphs (showing inter-service dependencies). The root span is the graph's entry point, and the collective timing of all spans defines the request's total latency.
How Distributed Tracing Works
Distributed tracing is a diagnostic technique that reconstructs the complete lifecycle of a single request as it traverses a complex, multi-service architecture.
A trace is a directed acyclic graph (DAG) of spans, where each span represents a discrete unit of work within a service, such as a database query or an API call. The system is initiated when a root service assigns a globally unique Trace ID and creates the initial span. This context, containing the Trace ID and the current Span ID, is then propagated—typically via HTTP headers like those defined in W3C Trace Context—to every downstream service called during the request's execution.
Each instrumented service uses the propagated context to create child spans, linking them to the parent via the Span ID, thereby building the complete graph. After the request finishes, all spans are collected, often via an OpenTelemetry Collector, and assembled using the shared Trace ID. This reconstructed timeline is visualized in tools like Jaeger or Zipkin as a flame graph, enabling engineers to pinpoint latency bottlenecks, failed services, and unexpected execution paths across the entire distributed system.
Traces in Agentic Observability
In agentic systems, a trace is the definitive record of an autonomous agent's execution path, capturing its internal reasoning, external tool calls, and state changes as a directed acyclic graph (DAG) of operations.
Core Definition & Structure
A trace is a collection of spans that represents the end-to-end path of a single request or agent execution as it propagates through a distributed system. It forms a directed acyclic graph (DAG) where:
- Each span is a named, timed operation representing a unit of work.
- Parent-child relationships define the flow and hierarchy of operations.
- The root span initiates the trace, with subsequent spans as children or follows-from links. This structure is essential for visualizing the complete lifecycle of an agent's task, from initial prompt to final action.
The Span: Fundamental Building Block
A span is the atomic unit of a trace. For agentic observability, spans capture distinct phases of agentic work:
- Internal Reasoning: A span for a planning cycle, chain-of-thought, or reflection step.
- Tool/API Execution: A span for an external function call, database query, or API request, including duration and success status.
- Memory Operations: Spans for reading from or writing to a vector store or knowledge graph.
Each span contains critical metadata: a span ID, parent span ID, start/end timestamps, span kind (e.g.,
INTERNAL,CLIENT), and attributes (key-value pairs detailing the operation).
Context Propagation Across Boundaries
For a trace to be truly distributed, span context must propagate across process and network boundaries. This is the mechanism that ties an agent's internal reasoning to its external API calls.
- Trace ID & Span ID: A globally unique Trace ID identifies the entire execution. The current Span ID identifies the specific operation.
- Propagation Standards: Context is carried via headers using standards like W3C Trace Context or B3 Propagation.
- Agentic Specifics: When an agent calls a tool, the SDK injects the current span context into the HTTP request. The tool's service extracts it, creating a child span, thereby extending the trace into the external service.
Enrichment for Agentic Understanding
Raw spans are useful for timing; enriched spans are critical for auditing and debugging agent behavior. Trace enrichment adds semantic, business, and agent-specific context.
- Agent State: Attach the current goal, plan step, or conversation turn as span attributes.
- Tool Call Details: Enrich spans with the exact function name, parameters, and parsed results.
- Cost Telemetry: Add attributes for LLM token usage, model name, and API call cost.
- Business Context: Include user ID, session ID, or transaction ID to link technical traces to business outcomes.
Visualization: Flame Graphs & Service Graphs
Traces are visualized to diagnose performance and understand flow.
- Flame Graph: A hierarchical visualization where the width of a horizontal bar represents a span's duration. It instantly shows the critical path and which internal reasoning step or tool call caused latency.
- Service Graph (or Dependency Map): A topological map automatically generated from trace data. For multi-agent systems, it shows agents as nodes and their communication (RPC, messages) as edges, revealing the interaction network and upstream/downstream dependencies.
Sampling & Data Volume Management
Capturing every trace is often prohibitively expensive. Trace sampling strategically reduces volume.
- Head Sampling: Decision made at the start of a request (e.g., sample 10% of all agent sessions). Simple but may miss rare, important traces.
- Tail Sampling: Decision made after the trace is complete, based on its full content. Crucial for agents, as it allows rules like:
Sample 100% of traces with errors(e.g., tool call failure).Sample 100% of traces where final answer confidence < threshold.Sample traces with latency > 2s. This ensures high-value agent executions are always retained for analysis.
Frequently Asked Questions
A trace is the foundational data structure for understanding request flow in distributed systems. These questions address its core mechanics, implementation, and value for observability.
A trace is a collection of spans that represents the complete, end-to-end path of a single logical request as it propagates through a distributed system, forming a directed acyclic graph (DAG) of operations. It provides a holistic view of the request's journey, capturing the causal relationships and timing between all participating services, databases, and external APIs. In agentic systems, a trace visualizes the entire cognitive workflow—from initial user prompt, through planning and tool execution, to final response—enabling engineers to audit autonomy and pinpoint latency bottlenecks.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
A trace is built from interconnected components and processes. These related terms define the core concepts, standards, and tools that make end-to-end distributed tracing possible.
Span
A span is the fundamental building block of a trace, representing a single, named, and timed operation within a service. It captures the work done for a specific unit of logic, such as:
- A function call or method execution
- A database query
- An HTTP request to an external API
Each span contains a start time, duration, status code, and span attributes (key-value metadata). Spans are linked in parent-child relationships to form the hierarchical structure of a trace.
Distributed Tracing
Distributed tracing is the overarching methodology for profiling and monitoring requests as they flow through a distributed system. It involves:
- Instrumenting services to emit spans.
- Propagating trace context (like a Trace ID) across service boundaries via HTTP headers or message metadata.
- Collecting and visualizing these correlated spans to reconstruct the complete request path.
The primary goal is to understand system performance, identify bottlenecks (visualized in a flame graph), and diagnose failures in microservices or agentic architectures.
OpenTelemetry (OTel)
OpenTelemetry (OTel) is the open-source, vendor-neutral standard for generating, collecting, and exporting telemetry data. It is the de facto framework for implementing distributed tracing. Key components include:
- APIs and SDKs for manual and auto-instrumentation.
- The OpenTelemetry Collector, a proxy for receiving, processing, and exporting data.
- OTLP (OpenTelemetry Protocol), the standard gRPC/HTTP protocol for data transmission.
- Standardized semantic conventions for span attributes (e.g.,
http.method,db.query).
Trace Sampling
Trace sampling is the critical process of selectively capturing a subset of traces to balance observability depth with data volume and cost. Two primary strategies are:
- Head Sampling: The sampling decision is made at the start of a request (e.g., 10% of all traces). It's efficient but may miss important late-breaking events.
- Tail Sampling: The decision is made after the trace is complete, based on its full context (e.g., "sample all traces with latency > 2s or an error status"). This is more powerful but requires buffering traces in a collector.
Sampling rules are essential for production-scale observability.
W3C Trace Context
W3C Trace Context is a formal W3C recommendation that defines a standard HTTP header format (traceparent, tracestate) for distributed context propagation. It ensures interoperability between different tracing systems and libraries by providing a unified way to carry:
- The Trace ID and Span ID.
- Sampling flags.
- Vendor-specific tracing state.
This standard replaced earlier, vendor-specific formats like B3 Propagation, enabling seamless tracing across heterogeneous technology stacks.
Service Graph
A service graph is a dynamic, topological map automatically derived from aggregated trace data. It visualizes the services (nodes) and the request dependencies or calls (edges) between them. This provides a system-wide view for:
- Identifying unexpected or brittle dependencies.
- Understanding the critical path of requests.
- Detecting changes in the communication topology after deployments.
Unlike a single trace, which shows one request's path, a service graph synthesizes data from many traces to show the architectural relationships and health of the entire ecosystem.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us