Glossary

Flame Graph

A flame graph is a visualization of hierarchical profiling data, where in distributed tracing, it represents the nested call stack of spans within a trace, with width indicating duration.

Get in touch Learn more

Large-scale analytics wall displaying performance trends and system relationships.

VISUALIZATION

What is a Flame Graph?

A flame graph is a hierarchical visualization of profiling data, adapted in distributed tracing to represent the nested call stack of spans within a single trace.

A flame graph is a visualization of hierarchical profiling data, where in the context of distributed tracing, it represents the nested call stack of spans within a single trace. Each horizontal rectangle (or "flame") represents a span, its width corresponds to the span's duration or a sampled metric like CPU time, and its vertical stacking shows the parent-child relationships between spans. This provides an immediate, intuitive view of where time is being spent across an entire request's lifecycle, making it a powerful tool for performance analysis and identifying latency bottlenecks.

The visualization is generated by aggregating many sampled execution profiles or traces and sorting sibling spans alphabetically to allow patterns to emerge. In agentic observability, flame graphs are crucial for auditing the internal reasoning loops and tool calls of an autonomous agent, providing a deterministic, visual proof of execution flow. Key related concepts include the underlying trace data structure, span attributes for metadata, and tail sampling strategies that determine which traces are visualized.

DISTRIBUTED TRACE COLLECTION

Key Features of a Flame Graph

In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, where width represents duration or resource consumption, enabling rapid performance bottleneck identification.

Hierarchical Stack Visualization

A flame graph represents the call stack of a program or trace as a set of nested, horizontal rectangles. Each rectangle, or frame, represents a function or span. The vertical axis shows stack depth, with the root span at the bottom and child spans stacked above it. This nesting directly maps to the parent-child relationships defined by span IDs within a trace, making the execution flow immediately apparent.

Width Proportional to Metric

The primary quantitative insight comes from the width of each frame. In a CPU profile flame graph, width is proportional to the time spent in that function. In a distributed tracing context, width typically represents the duration of a span. This allows engineers to visually identify hot code paths or latency bottlenecks at a glance—the widest frames consume the most resources. The graph aggregates samples, so width represents the sum of all invocations of that function/span.

Color as a Secondary Dimension

Color is used as a consistent, non-quantitative visual aid to improve readability and differentiate between types of operations. Common schemes include:

Hue by library or namespace (e.g., green for application code, red for database calls, blue for HTTP clients).
Saturation by resource type (e.g., different shades for CPU vs. I/O waits).
Monochromatic to reduce cognitive load, where color simply helps distinguish adjacent frames. Color does not encode magnitude; the width carries all quantitative information.

Interactive Exploration

Modern flame graph implementations are interactive visualizations. Key interactions include:

Click-to-zoom: Clicking a frame zooms the view to show only that stack and its children, enabling detailed inspection of deep call paths.
Search highlighting: Searching for a function or service name highlights all matching frames across the graph.
Tooltip details: Hovering over a frame reveals precise metadata, such as span name, duration, span attributes, and percentage of total trace time. This interactivity transforms a static profile into an investigative tool for performance debugging.

Aggregation of Samples

A flame graph is an aggregated visualization. It does not show every individual function call or span instance in a timeline. Instead, it merges all sampled stack traces or spans, summing their durations. This aggregation is powerful for identifying statistically significant bottlenecks across many requests. For example, if a specific database query appears wide, it indicates that query is a major contributor to latency across the sampled traces, not just in one anomalous request.

Integration with Distributed Traces

When applied to distributed tracing, a flame graph visualizes a single trace or an aggregate of traces. Each frame corresponds to a span. The hierarchy shows the propagation of work across services. This provides a unified view of end-to-end latency, revealing whether time is spent in a specific microservice, a particular tool call, or in network communication between spans. It bridges the gap between traditional profiling and distributed systems observability, making complex trace data intuitively scannable.

VISUALIZATION COMPARISON

Flame Graph vs. Other Trace Visualizations

A comparison of visualization techniques for analyzing hierarchical profiling and distributed trace data, highlighting their primary use cases and interpretability.

Feature / Metric	Flame Graph	Timeline (Gantt) View	Service Graph	Call Tree
Primary Visualization	Nested horizontal rectangles	Horizontal bars on a timeline	Directed graph of nodes & edges	Indented text hierarchy
Width Represents	Aggregate duration or sample count	Absolute start time and duration	Request volume or error rate	N/A (structure only)
Height Represents	Call stack depth	N/A (single service/span level)	N/A (service level)	Call stack depth
Best For Identifying	Hot code paths & cumulative time consumers	Concurrency, parallelism, & absolute timing	Service dependencies & topology	Exact sequence of calls & branching logic
Trace Span Aggregation	Aggregates identical stack sequences	Shows individual spans	Aggregates service-level interactions	Shows individual span hierarchy
Intuitive for Performance Bottlenecks
Shows System Topology
Handles High Concurrency / Fan-Out

DISTRIBUTED TRACE COLLECTION

Flame Graph Use Cases

In distributed tracing, a flame graph visualizes the hierarchical call stack of spans within a trace, with bar width representing span duration. This provides an intuitive, aggregated view for performance analysis.

Latency Bottleneck Identification

Flame graphs are the primary tool for identifying the critical path and hot spots in a distributed request. The widest bars visually pinpoint the most time-consuming operations, whether they are deep in a single service's call stack or spread across multiple services.

Root cause analysis: Quickly see if latency is dominated by a specific database query, a slow external API call, or internal computation.
Comparative analysis: Compare flame graphs from fast vs. slow traces to isolate regressions or environmental differences.

EXPLORE

Understanding Service Dependencies

A flame graph derived from a distributed trace reveals the service topology and call hierarchy for a specific request. It shows how work propagates from a root span through various downstream services.

Visualizing fan-out: See parallel calls to multiple services and identify if one slow dependency is serializing the entire workflow.
Mapping code-to-infrastructure: Connect business logic (function names in spans) to the underlying infrastructure components (database, cache, external APIs) they invoke.

Analyzing Parallel vs. Sequential Execution

The horizontal stacking in a flame graph clearly distinguishes sequential operations (stacks of bars) from concurrent operations (bars side-by-side at the same depth). This is critical for optimizing asynchronous workflows.

Identifying blocking calls: Spot where the execution could be parallelized but is currently sequential, creating artificial latency.
Validating async patterns: Confirm that intended concurrent operations (e.g., fan-out API calls) are executing in parallel as designed.

Resource Utilization & Cost Attribution

By mapping time spent to specific functions and services, flame graphs enable granular cost attribution. In agentic systems, this is essential for understanding the compute cost of specific reasoning steps or tool calls.

Token usage correlation: In LLM-based agents, correlate wide bars (long durations) with high-token-count prompts or completions.
External API cost analysis: Identify which third-party tool calls are the most expensive in terms of both latency and direct API costs.

>70%

Traces where a single span accounts for the majority of latency

Debugging Agentic Reasoning Loops

For autonomous agents, a flame graph visualizes the planning, execution, and reflection cycle. Each major loop iteration appears as a distinct set of frames, allowing engineers to see time spent in cognitive phases versus tool execution.

Inefficient planning: Identify agents stuck in excessive planning or reflection, indicated by deep, wide stacks of LLM calls.
Tool execution profiling: See the exact sequence and duration of external tool calls (API, database, code execution) within an agent's action phase.

Performance Regression Detection

Flame graphs serve as a visual baseline for normal performance. Automated systems can diff flame graph shapes or aggregate span durations across deployments to detect regressions.

Post-deployment analysis: Compare aggregate flame graphs from before and after a code deploy to see if new spans were added or existing ones became slower.
Anomaly detection: Flag traces where the flame graph shape deviates significantly from the norm, indicating potential performance anomalies or errors.

FLAME GRAPH

Frequently Asked Questions

A flame graph is a critical visualization tool in distributed tracing and performance profiling. This FAQ addresses its core mechanics, construction, and role in diagnosing performance issues within agentic and distributed systems.

A flame graph is a visualization of hierarchical profiling data where, in the context of distributed tracing, it represents the nested call stack of spans within a single trace, with the width of each rectangular block indicating the relative duration or resource consumption (e.g., CPU time) of that operation.

Originally created by Brendan Gregg for CPU profiling, the flame graph's adaptation for tracing transforms a trace's directed acyclic graph (DAG) of spans into a consolidated, left-to-right-ordered stack. The y-axis shows stack depth (call hierarchy), and the x-axis spans the entire sampling period, ordered alphabetically to allow merging of identical stack frames. This format allows engineers to instantly identify the widest (most time-consuming) code paths or service calls, which are the primary targets for optimization.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

DISTRIBUTED TRACE COLLECTION

Related Terms

A flame graph is a specific visualization within the broader practice of distributed tracing. To fully understand its utility, it's essential to grasp the foundational concepts and systems that produce the data it displays.

Span

A span is the fundamental unit of work in distributed tracing, representing a named, timed operation for a contiguous segment of work within a service. In a flame graph, each horizontal rectangle (or "flame") corresponds to a span.

Key Properties: Contains a start/end timestamp, a name, span attributes (key-value metadata), and a span kind (e.g., Client, Server).
Hierarchy: Spans have parent-child relationships, forming the nested stack visualized in a flame graph. The width of a span's rectangle represents its duration.

Trace

A trace is a collection of spans that represents the complete end-to-end path of a single request as it propagates through a distributed system. A flame graph visualizes one entire trace.

Structure: Spans within a trace form a directed acyclic graph (DAG), though flame graphs typically show a simplified, aggregated call stack view.
Correlation: All spans in a trace share a unique Trace ID, enabling trace correlation with logs and metrics for unified debugging.

Distributed Tracing

Distributed tracing is the overarching methodology of instrumenting applications to observe requests as they flow across service boundaries. Flame graphs are a primary diagnostic output of this practice.

Purpose: Used to understand system latency, diagnose performance bottlenecks, and visualize service dependencies.
Mechanism: Relies on distributed context propagation (e.g., via W3C Trace Context headers) to pass trace IDs and span IDs between services, maintaining continuity.

OpenTelemetry (OTel)

OpenTelemetry (OTel) is the vendor-neutral, open-source standard for generating, collecting, and exporting telemetry data, including traces. It is the primary source of data for modern flame graphs.

Components: Includes APIs/SDKs for instrumentation, the OTLP protocol for data export, and the OpenTelemetry Collector for processing.
Role: Provides the standardized span and trace data model that visualization tools like flame graphs consume. Auto-instrumentation via OTel agents is a common way to generate this data without code changes.

Service Graph

A service graph is a complementary visualization to a flame graph. While a flame graph shows the internal call stack of a single request, a service graph shows the macro-level dependencies between services across all requests.

Derivation: Automatically generated by aggregating span data from many traces to identify which services call each other.
Use Case: Used for architectural understanding, identifying upstream/downstream impacts of failures, and validating deployment topology.

Tail Sampling

Tail sampling is a critical strategy for managing trace data volume before visualization. It decides whether to keep or discard a trace after the request is complete, based on its full context.

Contrast with Head Sampling: Head sampling decides at the request's start, potentially missing interesting traces that only exhibit problems (e.g., high latency) later.
Flame Graph Relevance: Enables cost-effective storage by only retaining traces that are most valuable for analysis, such as those with errors or exceeding latency thresholds, which are prime candidates for flame graph inspection.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Flame Graph

What is a Flame Graph?

Key Features of a Flame Graph

Hierarchical Stack Visualization

Width Proportional to Metric

Color as a Secondary Dimension

Interactive Exploration

Aggregation of Samples

Integration with Distributed Traces

Flame Graph vs. Other Trace Visualizations

Flame Graph Use Cases

Latency Bottleneck Identification

Understanding Service Dependencies

Analyzing Parallel vs. Sequential Execution

Resource Utilization & Cost Attribution

Debugging Agentic Reasoning Loops

Performance Regression Detection

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there