A flame graph is a visualization of hierarchical profiling data, where in the context of distributed tracing, it represents the nested call stack of spans within a single trace. Each horizontal rectangle (or "flame") represents a span, its width corresponds to the span's duration or a sampled metric like CPU time, and its vertical stacking shows the parent-child relationships between spans. This provides an immediate, intuitive view of where time is being spent across an entire request's lifecycle, making it a powerful tool for performance analysis and identifying latency bottlenecks.
Glossary
Flame Graph

What is a Flame Graph?
A flame graph is a hierarchical visualization of profiling data, adapted in distributed tracing to represent the nested call stack of spans within a single trace.
The visualization is generated by aggregating many sampled execution profiles or traces and sorting sibling spans alphabetically to allow patterns to emerge. In agentic observability, flame graphs are crucial for auditing the internal reasoning loops and tool calls of an autonomous agent, providing a deterministic, visual proof of execution flow. Key related concepts include the underlying trace data structure, span attributes for metadata, and tail sampling strategies that determine which traces are visualized.
Key Features of a Flame Graph
In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, where width represents duration or resource consumption, enabling rapid performance bottleneck identification.
Hierarchical Stack Visualization
A flame graph represents the call stack of a program or trace as a set of nested, horizontal rectangles. Each rectangle, or frame, represents a function or span. The vertical axis shows stack depth, with the root span at the bottom and child spans stacked above it. This nesting directly maps to the parent-child relationships defined by span IDs within a trace, making the execution flow immediately apparent.
Width Proportional to Metric
The primary quantitative insight comes from the width of each frame. In a CPU profile flame graph, width is proportional to the time spent in that function. In a distributed tracing context, width typically represents the duration of a span. This allows engineers to visually identify hot code paths or latency bottlenecks at a glance—the widest frames consume the most resources. The graph aggregates samples, so width represents the sum of all invocations of that function/span.
Color as a Secondary Dimension
Color is used as a consistent, non-quantitative visual aid to improve readability and differentiate between types of operations. Common schemes include:
- Hue by library or namespace (e.g., green for application code, red for database calls, blue for HTTP clients).
- Saturation by resource type (e.g., different shades for CPU vs. I/O waits).
- Monochromatic to reduce cognitive load, where color simply helps distinguish adjacent frames. Color does not encode magnitude; the width carries all quantitative information.
Interactive Exploration
Modern flame graph implementations are interactive visualizations. Key interactions include:
- Click-to-zoom: Clicking a frame zooms the view to show only that stack and its children, enabling detailed inspection of deep call paths.
- Search highlighting: Searching for a function or service name highlights all matching frames across the graph.
- Tooltip details: Hovering over a frame reveals precise metadata, such as span name, duration, span attributes, and percentage of total trace time. This interactivity transforms a static profile into an investigative tool for performance debugging.
Aggregation of Samples
A flame graph is an aggregated visualization. It does not show every individual function call or span instance in a timeline. Instead, it merges all sampled stack traces or spans, summing their durations. This aggregation is powerful for identifying statistically significant bottlenecks across many requests. For example, if a specific database query appears wide, it indicates that query is a major contributor to latency across the sampled traces, not just in one anomalous request.
Integration with Distributed Traces
When applied to distributed tracing, a flame graph visualizes a single trace or an aggregate of traces. Each frame corresponds to a span. The hierarchy shows the propagation of work across services. This provides a unified view of end-to-end latency, revealing whether time is spent in a specific microservice, a particular tool call, or in network communication between spans. It bridges the gap between traditional profiling and distributed systems observability, making complex trace data intuitively scannable.
Flame Graph vs. Other Trace Visualizations
A comparison of visualization techniques for analyzing hierarchical profiling and distributed trace data, highlighting their primary use cases and interpretability.
| Feature / Metric | Flame Graph | Timeline (Gantt) View | Service Graph | Call Tree |
|---|---|---|---|---|
Primary Visualization | Nested horizontal rectangles | Horizontal bars on a timeline | Directed graph of nodes & edges | Indented text hierarchy |
Width Represents | Aggregate duration or sample count | Absolute start time and duration | Request volume or error rate | N/A (structure only) |
Height Represents | Call stack depth | N/A (single service/span level) | N/A (service level) | Call stack depth |
Best For Identifying | Hot code paths & cumulative time consumers | Concurrency, parallelism, & absolute timing | Service dependencies & topology | Exact sequence of calls & branching logic |
Trace Span Aggregation | Aggregates identical stack sequences | Shows individual spans | Aggregates service-level interactions | Shows individual span hierarchy |
Intuitive for Performance Bottlenecks | ||||
Shows System Topology | ||||
Handles High Concurrency / Fan-Out |
Flame Graph Use Cases
In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, with bar width representing span duration. This provides an intuitive, aggregated view for performance analysis.
Understanding Service Dependencies
A flame graph derived from a distributed trace reveals the service topology and call hierarchy for a specific request. It shows how work propagates from a root span through various downstream services.
- Visualizing fan-out: See parallel calls to multiple services and identify if one slow dependency is serializing the entire workflow.
- Mapping code-to-infrastructure: Connect business logic (function names in spans) to the underlying infrastructure components (database, cache, external APIs) they invoke.
Analyzing Parallel vs. Sequential Execution
The horizontal stacking in a flame graph clearly distinguishes sequential operations (stacks of bars) from concurrent operations (bars side-by-side at the same depth). This is critical for optimizing asynchronous workflows.
- Identifying blocking calls: Spot where the execution could be parallelized but is currently sequential, creating artificial latency.
- Validating async patterns: Confirm that intended concurrent operations (e.g., fan-out API calls) are executing in parallel as designed.
Resource Utilization & Cost Attribution
By mapping time spent to specific functions and services, flame graphs enable granular cost attribution. In agentic systems, this is essential for understanding the compute cost of specific reasoning steps or tool calls.
- Token usage correlation: In LLM-based agents, correlate wide bars (long durations) with high-token-count prompts or completions.
- External API cost analysis: Identify which third-party tool calls are the most expensive in terms of both latency and direct API costs.
Debugging Agentic Reasoning Loops
For autonomous agents, a flame graph visualizes the planning, execution, and reflection cycle. Each major loop iteration appears as a distinct set of frames, allowing engineers to see time spent in cognitive phases versus tool execution.
- Inefficient planning: Identify agents stuck in excessive planning or reflection, indicated by deep, wide stacks of LLM calls.
- Tool execution profiling: See the exact sequence and duration of external tool calls (API, database, code execution) within an agent's action phase.
Performance Regression Detection
Flame graphs serve as a visual baseline for normal performance. Automated systems can diff flame graph shapes or aggregate span durations across deployments to detect regressions.
- Post-deployment analysis: Compare aggregate flame graphs from before and after a code deploy to see if new spans were added or existing ones became slower.
- Anomaly detection: Flag traces where the flame graph shape deviates significantly from the norm, indicating potential performance anomalies or errors.
Frequently Asked Questions
A flame graph is a critical visualization tool in distributed tracing and performance profiling. This FAQ addresses its core mechanics, construction, and role in diagnosing performance issues within agentic and distributed systems.
A flame graph is a visualization of hierarchical profiling data where, in the context of distributed tracing, it represents the nested call stack of spans within a single trace, with the width of each rectangular block indicating the relative duration or resource consumption (e.g., CPU time) of that operation.
Originally created by Brendan Gregg for CPU profiling, the flame graph's adaptation for tracing transforms a trace's directed acyclic graph (DAG) of spans into a consolidated, left-to-right-ordered stack. The y-axis shows stack depth (call hierarchy), and the x-axis spans the entire sampling period, ordered alphabetically to allow merging of identical stack frames. This format allows engineers to instantly identify the widest (most time-consuming) code paths or service calls, which are the primary targets for optimization.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
A flame graph is a specific visualization within the broader practice of distributed tracing. To fully understand its utility, it's essential to grasp the foundational concepts and systems that produce the data it displays.
Span
A span is the fundamental unit of work in distributed tracing, representing a named, timed operation for a contiguous segment of work within a service. In a flame graph, each horizontal rectangle (or "flame") corresponds to a span.
- Key Properties: Contains a start/end timestamp, a name, span attributes (key-value metadata), and a span kind (e.g., Client, Server).
- Hierarchy: Spans have parent-child relationships, forming the nested stack visualized in a flame graph. The width of a span's rectangle represents its duration.
Trace
A trace is a collection of spans that represents the complete end-to-end path of a single request as it propagates through a distributed system. A flame graph visualizes one entire trace.
- Structure: Spans within a trace form a directed acyclic graph (DAG), though flame graphs typically show a simplified, aggregated call stack view.
- Correlation: All spans in a trace share a unique Trace ID, enabling trace correlation with logs and metrics for unified debugging.
Distributed Tracing
Distributed tracing is the overarching methodology of instrumenting applications to observe requests as they flow across service boundaries. Flame graphs are a primary diagnostic output of this practice.
- Purpose: Used to understand system latency, diagnose performance bottlenecks, and visualize service dependencies.
- Mechanism: Relies on distributed context propagation (e.g., via W3C Trace Context headers) to pass trace IDs and span IDs between services, maintaining continuity.
OpenTelemetry (OTel)
OpenTelemetry (OTel) is the vendor-neutral, open-source standard for generating, collecting, and exporting telemetry data, including traces. It is the primary source of data for modern flame graphs.
- Components: Includes APIs/SDKs for instrumentation, the OTLP protocol for data export, and the OpenTelemetry Collector for processing.
- Role: Provides the standardized span and trace data model that visualization tools like flame graphs consume. Auto-instrumentation via OTel agents is a common way to generate this data without code changes.
Service Graph
A service graph is a complementary visualization to a flame graph. While a flame graph shows the internal call stack of a single request, a service graph shows the macro-level dependencies between services across all requests.
- Derivation: Automatically generated by aggregating span data from many traces to identify which services call each other.
- Use Case: Used for architectural understanding, identifying upstream/downstream impacts of failures, and validating deployment topology.
Tail Sampling
Tail sampling is a critical strategy for managing trace data volume before visualization. It decides whether to keep or discard a trace after the request is complete, based on its full context.
- Contrast with Head Sampling: Head sampling decides at the request's start, potentially missing interesting traces that only exhibit problems (e.g., high latency) later.
- Flame Graph Relevance: Enables cost-effective storage by only retaining traces that are most valuable for analysis, such as those with errors or exceeding latency thresholds, which are prime candidates for flame graph inspection.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us