Tail sampling is a trace sampling strategy where the decision to retain or discard a complete trace is made after the request has finished, based on its full set of attributes. Unlike head sampling, which decides at the request's start, this approach allows sampling rules to target specific, valuable patterns like traces with high latency, errors, or particular business logic outcomes. This ensures critical diagnostic data is captured while efficiently reducing overall telemetry volume and cost.
Glossary
Tail Sampling

What is Tail Sampling?
Tail sampling is a strategic method for managing distributed trace data volume by making sampling decisions after a request is complete.
The strategy is typically implemented within an OpenTelemetry Collector using a dedicated processor. This processor buffers spans until a trace is complete, then evaluates it against configurable policies—such as keeping all traces with an error status or those exceeding a latency threshold. This post-hoc filtering is essential for agentic observability, where capturing the complete reasoning path of a failed or slow autonomous agent operation is crucial for debugging and performance analysis.
Key Features of Tail Sampling
Tail sampling is a trace sampling strategy where the decision to keep or discard a trace is made after the request is complete, based on its full set of attributes (e.g., high latency, errors). This contrasts with head sampling, where the decision is made at the start.
Post-Request Decision Making
The defining characteristic of tail sampling is that the sampling decision is deferred until after the entire request has completed. This allows the sampling logic to evaluate the complete trace context, including:
- Total request latency
- Final HTTP status code or error state
- All aggregated span attributes from across services
- The presence of specific business logic markers or tags This enables highly informed sampling based on the actual outcome, rather than a probabilistic guess at the start.
Rule-Based Filtering
Tail sampling uses declarative rules to determine which traces are retained. These rules are evaluated against the complete trace. Common rule types include:
- Latency-based: Keep traces where the total duration exceeds a threshold (e.g., > 2 seconds).
- Error-based: Keep all traces that resulted in an error (HTTP 5xx, application exceptions).
- Attribute-based: Keep traces containing specific span attributes (e.g.,
customer_tier="premium",http.route="/api/payment"). - Probabilistic: Keep a random percentage of all traces, applied after other rules. Rules are typically executed in a pipeline within a central collector like the OpenTelemetry Collector.
Centralized Collector Implementation
Tail sampling is almost always implemented in a centralized telemetry processor, not within individual services. The OpenTelemetry Collector is the canonical implementation, using its tail_sampling processor. The workflow is:
- All services emit 100% of traces to the collector.
- The collector buffers traces for a configurable period.
- Once a trace is considered "complete," the collector evaluates it against the defined sampling policy.
- Traces matching the policy are exported to the backend (e.g., Jaeger, Datadog); others are discarded. This architecture prevents sampling bias and ensures consistent rule application across the entire system.
Optimal for Debugging & SLOs
This strategy is engineered for debugging and Service Level Objective (SLO) monitoring, not for reducing upstream data volume. Its core value is in guaranteeing the retention of diagnostically valuable traces that would be randomly missed by head sampling.
- Debugging: Ensures all error traces and high-latency outliers are captured for root cause analysis.
- SLO Monitoring: Enables reliable calculation of error rates and latency percentiles (p95, p99) from the sampled data, as the sample is not random but criteria-based.
- Cost Efficiency: While it processes 100% of traces initially, it dramatically reduces long-term storage costs by discarding uninteresting, fast-successful traces.
Trade-offs: Latency & Resource Overhead
Tail sampling introduces specific engineering trade-offs:
- Decision Latency: Traces are not available in the backend in real-time. They are delayed by the collector's
decision_waittime (e.g., 10-30 seconds) as it waits for slow spans to arrive. - Collector Resource Load: The collector must buffer and evaluate every single trace, requiring significant memory and CPU resources, especially under high request volumes.
- Trace Completeness Risk: If the collector crashes or restarts, all buffered traces that haven't been evaluated are lost. This requires careful deployment with high availability and persistent buffers.
Common Policy Patterns
Effective tail sampling combines multiple rules into a policy. A standard production policy might be:
- Always keep error traces (
status_code == ERROR). - Keep slow traces (
latency > 1s). - Keep a tiny percentage of all successful traces (e.g., 0.1% probabilistic) for general traffic shape monitoring.
- Always keep traces for critical user journeys (e.g., where
span.attributes.user_idis in a premium list). This layered approach ensures comprehensive coverage for incidents while maintaining a manageable data volume. The policy is typically defined in the collector's configuration YAML.
Tail Sampling vs. Head Sampling
A comparison of the two primary strategies for controlling the volume and cost of distributed trace data in observability pipelines.
| Feature / Metric | Tail Sampling | Head Sampling |
|---|---|---|
Decision Point | After the trace is complete (post-request). | At the start of the trace (pre-request). |
Decision Basis | Complete trace attributes (e.g., latency, error status, specific span data). | Pre-configured probability or rule (e.g., 10% of all requests). |
Data Required for Decision | Full trace context and all span data. | Only the initial trace context (e.g., trace ID, initial attributes). |
Implementation Complexity | High. Requires a stateful buffer (e.g., in an OpenTelemetry Collector) to hold traces until the decision can be made. | Low. Decision is made immediately by the instrumented service or load balancer. |
Latency Impact on Trace | Adds processing delay as traces are buffered before being sampled and exported. | No added latency; the sampling decision is instantaneous. |
Ideal Use Case | Capturing rare but important events (high-latency outliers, errors, specific business transactions) without storing all data. | Controlling overall data volume with a simple, predictable cost model. Suitable for high-throughput, uniform systems. |
Storage & Cost Efficiency for Rare Events | High. Precisely targets and stores only traces matching important criteria, avoiding noise. | Low. Rare events are sampled at the same low probability as all other requests, making them likely to be missed. |
Deterministic for a Given Trace | Yes. The same trace will always receive the same sampling decision based on its immutable attributes. | Yes, if based on a deterministic rule (e.g., trace ID modulo). No, if purely random/probabilistic. |
Where is Tail Sampling Used?
Tail sampling is a strategic decision applied after a request completes, enabling selective retention based on its full characteristics. It is deployed in specific, high-value observability scenarios where capturing rare or anomalous events is critical.
Business Transaction Monitoring
Organizations use tail sampling to guarantee visibility into key user journeys or high-value transactions. Rules are based on business-level attributes added to spans via enrichment.
- Example: Sample 100% of traces where
transaction.typeequalscheckoutandcart.valueexceeds $1000. - Benefit: Provides deterministic observability for critical revenue-generating or compliance-sensitive workflows, ensuring they are never missed due to probabilistic sampling.
Security & Compliance Auditing
In regulated industries, tail sampling acts as an audit trail for security-sensitive operations. It ensures a complete record of requests accessing protected resources or exhibiting suspicious patterns.
- Example: Retain all traces where
user.roleequalsadminandaccess.targetcontains/financial/records. - Benefit: Creates an immutable, end-to-end forensic log of privileged access or potential security events, which is essential for post-incident analysis and regulatory compliance reporting.
Canary & Blue-Green Deployment Analysis
During progressive rollouts, tail sampling is configured to capture a higher percentage—or all—traces from the new deployment variant. This is often combined with trace-by-trace comparisons.
- Example: Sample 50% of all traces from the stable service, but 100% of traces from the canary service tagged with
deployment=canary-v2. - Benefit: Provides statistically significant, high-fidelity data to compare latency, error rates, and behavior between versions, enabling confident release decisions.
Frequently Asked Questions
Tail sampling is a critical strategy for managing the volume and cost of distributed trace data in production systems. These questions address its core mechanisms, trade-offs, and implementation.
Tail sampling is a trace sampling strategy where the decision to retain or discard a complete trace is made after the request has finished, based on its full set of attributes and final outcome. Unlike head sampling, which decides at the start of a request, tail sampling allows the sampling logic to evaluate the entire trace against a set of rules. This is typically implemented by buffering all spans for a trace in a collector (like the OpenTelemetry Collector) until the root span is received, indicating the trace is complete. The collector then applies configured sampling policies—such as keeping all traces with errors, high latency, or specific business attributes—before forwarding the selected traces to the backend storage and discarding the rest.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Tail sampling operates within a broader ecosystem of distributed tracing concepts. These related terms define the data structures, collection mechanisms, and processing pipelines that make this sampling strategy possible.
Trace
A trace is the complete end-to-end record of a single request as it flows through a distributed system. It is composed of a collection of spans that form a directed acyclic graph (DAG), showing the causal relationships between operations. Tail sampling makes its keep/discard decision based on the full context of a completed trace, such as its overall latency or error status.
Span
A span is the fundamental building block of a trace, representing a single, named, and timed operation within a service (e.g., a database query or an HTTP call). Each span contains:
- Span Attributes: Key-value metadata (e.g.,
http.status_code=500). - Span Kind: Semantic role (Client, Server, Internal).
- Timing data: Start and end timestamps. Tail sampling evaluates the aggregate of all spans in a trace to make its sampling decision.
Trace Sampling
Trace sampling is the overarching practice of selectively capturing a subset of traces to manage data volume, storage costs, and processing overhead. Tail sampling is one specific strategy within this practice. Other primary strategies include:
- Head Sampling: The sampling decision is made at the start of a request (e.g., simple random sampling).
- Tail Sampling: The decision is made at the end of a request, based on its full attributes. The choice between head and tail sampling is a core trade-off between upfront efficiency and retrospective, criteria-based capture.
OpenTelemetry Collector
The OpenTelemetry Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data. It is the most common architectural component where tail sampling logic is implemented. The collector runs processors that can:
- Ingest traces via OTLP.
- Buffer traces until they are complete.
- Apply sampling rules based on span attributes (e.g.,
error=trueorduration > 2s). - Forward only the sampled traces to backends like Jaeger or commercial APM tools.
Trace Enrichment
Trace enrichment is the process of adding contextual metadata to spans after they are generated. This often occurs in the same pipeline stage as tail sampling. Enrichment can add business context (e.g., user tier, transaction value) or deployment metadata (e.g., pod name, version) that can then be used as criteria in tail sampling rules. For example, a rule could be configured to sample 100% of traces where user.tier=premium and http.status_code=500.
Distributed Context Propagation
Distributed context propagation is the mechanism that carries the trace context (containing the Trace ID and Span ID) across service boundaries via HTTP headers, gRPC metadata, or message queues. This mechanism is essential for tail sampling because it ensures all spans from a single request are correlated into one trace. Standards like W3C Trace Context ensure interoperability, allowing the tail sampling processor to see the unified, end-to-end request.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us