Inferensys

Glossary

Flame Graph

A flame graph is a visualization of hierarchical profiling data, where in distributed tracing, it represents the nested call stack of spans within a trace, with width indicating duration.
Large-scale analytics wall displaying performance trends and system relationships.
VISUALIZATION

What is a Flame Graph?

A flame graph is a hierarchical visualization of profiling data, adapted in distributed tracing to represent the nested call stack of spans within a single trace.

A flame graph is a visualization of hierarchical profiling data, where in the context of distributed tracing, it represents the nested call stack of spans within a single trace. Each horizontal rectangle (or "flame") represents a span, its width corresponds to the span's duration or a sampled metric like CPU time, and its vertical stacking shows the parent-child relationships between spans. This provides an immediate, intuitive view of where time is being spent across an entire request's lifecycle, making it a powerful tool for performance analysis and identifying latency bottlenecks.

The visualization is generated by aggregating many sampled execution profiles or traces and sorting sibling spans alphabetically to allow patterns to emerge. In agentic observability, flame graphs are crucial for auditing the internal reasoning loops and tool calls of an autonomous agent, providing a deterministic, visual proof of execution flow. Key related concepts include the underlying trace data structure, span attributes for metadata, and tail sampling strategies that determine which traces are visualized.

DISTRIBUTED TRACE COLLECTION

Key Features of a Flame Graph

In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, where width represents duration or resource consumption, enabling rapid performance bottleneck identification.

01

Hierarchical Stack Visualization

A flame graph represents the call stack of a program or trace as a set of nested, horizontal rectangles. Each rectangle, or frame, represents a function or span. The vertical axis shows stack depth, with the root span at the bottom and child spans stacked above it. This nesting directly maps to the parent-child relationships defined by span IDs within a trace, making the execution flow immediately apparent.

02

Width Proportional to Metric

The primary quantitative insight comes from the width of each frame. In a CPU profile flame graph, width is proportional to the time spent in that function. In a distributed tracing context, width typically represents the duration of a span. This allows engineers to visually identify hot code paths or latency bottlenecks at a glance—the widest frames consume the most resources. The graph aggregates samples, so width represents the sum of all invocations of that function/span.

03

Color as a Secondary Dimension

Color is used as a consistent, non-quantitative visual aid to improve readability and differentiate between types of operations. Common schemes include:

  • Hue by library or namespace (e.g., green for application code, red for database calls, blue for HTTP clients).
  • Saturation by resource type (e.g., different shades for CPU vs. I/O waits).
  • Monochromatic to reduce cognitive load, where color simply helps distinguish adjacent frames. Color does not encode magnitude; the width carries all quantitative information.
04

Interactive Exploration

Modern flame graph implementations are interactive visualizations. Key interactions include:

  • Click-to-zoom: Clicking a frame zooms the view to show only that stack and its children, enabling detailed inspection of deep call paths.
  • Search highlighting: Searching for a function or service name highlights all matching frames across the graph.
  • Tooltip details: Hovering over a frame reveals precise metadata, such as span name, duration, span attributes, and percentage of total trace time. This interactivity transforms a static profile into an investigative tool for performance debugging.
05

Aggregation of Samples

A flame graph is an aggregated visualization. It does not show every individual function call or span instance in a timeline. Instead, it merges all sampled stack traces or spans, summing their durations. This aggregation is powerful for identifying statistically significant bottlenecks across many requests. For example, if a specific database query appears wide, it indicates that query is a major contributor to latency across the sampled traces, not just in one anomalous request.

06

Integration with Distributed Traces

When applied to distributed tracing, a flame graph visualizes a single trace or an aggregate of traces. Each frame corresponds to a span. The hierarchy shows the propagation of work across services. This provides a unified view of end-to-end latency, revealing whether time is spent in a specific microservice, a particular tool call, or in network communication between spans. It bridges the gap between traditional profiling and distributed systems observability, making complex trace data intuitively scannable.

VISUALIZATION COMPARISON

Flame Graph vs. Other Trace Visualizations

A comparison of visualization techniques for analyzing hierarchical profiling and distributed trace data, highlighting their primary use cases and interpretability.

Feature / MetricFlame GraphTimeline (Gantt) ViewService GraphCall Tree

Primary Visualization

Nested horizontal rectangles

Horizontal bars on a timeline

Directed graph of nodes & edges

Indented text hierarchy

Width Represents

Aggregate duration or sample count

Absolute start time and duration

Request volume or error rate

N/A (structure only)

Height Represents

Call stack depth

N/A (single service/span level)

N/A (service level)

Call stack depth

Best For Identifying

Hot code paths & cumulative time consumers

Concurrency, parallelism, & absolute timing

Service dependencies & topology

Exact sequence of calls & branching logic

Trace Span Aggregation

Aggregates identical stack sequences

Shows individual spans

Aggregates service-level interactions

Shows individual span hierarchy

Intuitive for Performance Bottlenecks

Shows System Topology

Handles High Concurrency / Fan-Out

DISTRIBUTED TRACE COLLECTION

Flame Graph Use Cases

In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, with bar width representing span duration. This provides an intuitive, aggregated view for performance analysis.

02

Understanding Service Dependencies

A flame graph derived from a distributed trace reveals the service topology and call hierarchy for a specific request. It shows how work propagates from a root span through various downstream services.

  • Visualizing fan-out: See parallel calls to multiple services and identify if one slow dependency is serializing the entire workflow.
  • Mapping code-to-infrastructure: Connect business logic (function names in spans) to the underlying infrastructure components (database, cache, external APIs) they invoke.
03

Analyzing Parallel vs. Sequential Execution

The horizontal stacking in a flame graph clearly distinguishes sequential operations (stacks of bars) from concurrent operations (bars side-by-side at the same depth). This is critical for optimizing asynchronous workflows.

  • Identifying blocking calls: Spot where the execution could be parallelized but is currently sequential, creating artificial latency.
  • Validating async patterns: Confirm that intended concurrent operations (e.g., fan-out API calls) are executing in parallel as designed.
04

Resource Utilization & Cost Attribution

By mapping time spent to specific functions and services, flame graphs enable granular cost attribution. In agentic systems, this is essential for understanding the compute cost of specific reasoning steps or tool calls.

  • Token usage correlation: In LLM-based agents, correlate wide bars (long durations) with high-token-count prompts or completions.
  • External API cost analysis: Identify which third-party tool calls are the most expensive in terms of both latency and direct API costs.
>70%
Traces where a single span accounts for the majority of latency
05

Debugging Agentic Reasoning Loops

For autonomous agents, a flame graph visualizes the planning, execution, and reflection cycle. Each major loop iteration appears as a distinct set of frames, allowing engineers to see time spent in cognitive phases versus tool execution.

  • Inefficient planning: Identify agents stuck in excessive planning or reflection, indicated by deep, wide stacks of LLM calls.
  • Tool execution profiling: See the exact sequence and duration of external tool calls (API, database, code execution) within an agent's action phase.
06

Performance Regression Detection

Flame graphs serve as a visual baseline for normal performance. Automated systems can diff flame graph shapes or aggregate span durations across deployments to detect regressions.

  • Post-deployment analysis: Compare aggregate flame graphs from before and after a code deploy to see if new spans were added or existing ones became slower.
  • Anomaly detection: Flag traces where the flame graph shape deviates significantly from the norm, indicating potential performance anomalies or errors.
FLAME GRAPH

Frequently Asked Questions

A flame graph is a critical visualization tool in distributed tracing and performance profiling. This FAQ addresses its core mechanics, construction, and role in diagnosing performance issues within agentic and distributed systems.

A flame graph is a visualization of hierarchical profiling data where, in the context of distributed tracing, it represents the nested call stack of spans within a single trace, with the width of each rectangular block indicating the relative duration or resource consumption (e.g., CPU time) of that operation.

Originally created by Brendan Gregg for CPU profiling, the flame graph's adaptation for tracing transforms a trace's directed acyclic graph (DAG) of spans into a consolidated, left-to-right-ordered stack. The y-axis shows stack depth (call hierarchy), and the x-axis spans the entire sampling period, ordered alphabetically to allow merging of identical stack frames. This format allows engineers to instantly identify the widest (most time-consuming) code paths or service calls, which are the primary targets for optimization.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.