Glossary

Zipkin

Zipkin is an open-source distributed tracing system that collects and visualizes timing data to diagnose latency problems in microservice architectures.

Get in touch Learn more

Architect reviewing LLM integration architecture on laptop, system diagrams visible, modern technical office setup.

DISTRIBUTED TRACING SYSTEM

What is Zipkin?

Zipkin is an open-source distributed tracing system for collecting and visualizing timing data to troubleshoot latency problems in service-oriented architectures.

Zipkin is an open-source distributed tracing system that collects timing data for requests as they propagate across services in a microservices architecture. It helps developers and SREs visualize the path and latency of requests, identifying bottlenecks and failures by instrumenting applications to report spans—timed operations representing work like an HTTP call or database query. Managed by the OpenZipkin community, it provides a backend for storage, a query API, and a web UI for analyzing trace data.

The system operates by having instrumented applications send trace data to a Zipkin backend via transports like HTTP or Apache Kafka. It stores this data, allowing queries to reconstruct the complete request flow. Zipkin popularized the B3 propagation header format for transmitting trace context and is often integrated via OpenTelemetry collectors. While a foundational tool, it is part of the broader observability ecosystem for understanding system behavior through end-to-end tracing.

DISTRIBUTED TRACING SYSTEM

Key Features of Zipkin

Zipkin is an open-source distributed tracing system that collects timing data for requests as they propagate across service boundaries, enabling latency analysis and dependency mapping in microservice architectures.

Span-Based Data Model

Zipkin's fundamental data unit is the span, which represents a single, timed operation within a service (e.g., an HTTP call or database query). Spans contain:

Name: The operation name.
Timestamp & Duration: Precise timing data.
Tags: Key-value pairs for contextual metadata (e.g., http.method=GET).
Annotations: Timestamped event logs within a span's lifetime. Spans are linked via parent-child relationships using Trace ID and Span ID to reconstruct the complete request path.

Trace Context Propagation

To correlate work across services, Zipkin propagates trace context using standardized headers. It primarily supports:

B3 Propagation: The original header format (X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId).
W3C Trace Context: The modern W3C standard for interoperability with other systems like OpenTelemetry. This propagation is handled by instrumentation libraries or a propagator, ensuring the Trace ID is carried through HTTP, gRPC, and messaging systems to maintain a continuous trace.

Multi-Component Architecture

Zipkin is designed as a collection of loosely coupled components:

Instrumented Application: Services generate trace data (spans).
Reporters/Exporters: Send spans from the application to a Zipkin collector (often via HTTP or Kafka).
Collector: Validates, indexes, and persists spans to storage.
Storage Backend: Supports pluggable options including Elasticsearch, Cassandra, and MySQL.
Query Service & API: Retrieves traces and dependencies from storage.
Web UI: Provides a graphical interface for finding and visualizing traces as flame graphs.

Dependency Analysis & Service Graphs

By analyzing trace data, Zipkin can automatically generate service dependency graphs. This visualization maps:

Nodes: Represent each service in the architecture.
Edges: Show the direction and volume of calls between services. This feature is critical for understanding systemic topology, identifying unexpected dependencies, and visualizing the impact of a service failure. The graph is derived from span kind attributes (e.g., Client, Server).

Integration & Instrumentation Ecosystem

Zipkin offers broad support for different frameworks and languages through community-maintained libraries. Instrumentation can be achieved via:

Manual Instrumentation: Using the Zipkin client library API directly.
Framework Instrumentation: Pre-built tracing for Spring (Sleuth), JAX-RS, gRPC, etc.
OpenTelemetry Integration: Spans generated by OpenTelemetry (OTel) SDKs can be exported to Zipkin using the OpenTelemetry Protocol (OTLP) or Zipkin-formatted exporters, making it a viable backend for OTel-based observability pipelines.

Sampling for Scalability

To manage data volume and storage costs in high-throughput systems, Zipkin supports trace sampling. This is typically configured at the instrumentation level.

Head-based Sampling: A decision is made at the start of a trace (e.g., sample 10% of requests).
Delegated Sampling: Can be integrated with external sampling proxies. While Zipkin itself does not perform tail sampling (decision after trace completion), this can be implemented upstream using a component like the OpenTelemetry Collector before data is sent to Zipkin storage.

DISTRIBUTED TRACING SYSTEM

How Zipkin Works

Zipkin is an open-source distributed tracing system that collects and visualizes timing data for requests as they propagate across a microservices architecture.

Zipkin operates by instrumenting services to generate spans—timed records of operations like API calls or database queries. These spans, linked by a shared trace ID, are reported to a Zipkin backend. The system uses context propagation via headers (like the B3 format) to pass this ID between services, maintaining a continuous end-to-end trace of the entire request lifecycle for latency analysis.

The collected trace data is stored and indexed, enabling visualization through a flame graph to identify performance bottlenecks. Zipkin also generates dependency graphs showing service interactions. It integrates with instrumentation libraries and the OpenTelemetry Collector via protocols like JSON over HTTP, providing a focused tool for troubleshooting distributed system latency without built-in metrics or logging.

DISTRIBUTED TRACING SYSTEMS

Zipkin vs. Jaeger vs. OpenTelemetry

A technical comparison of three major open-source projects for distributed tracing, focusing on architecture, data collection, and ecosystem role.

Feature / Component	Zipkin	Jaeger	OpenTelemetry
Primary Role	Distributed tracing backend and API	End-to-end distributed tracing system	Vendor-neutral telemetry framework (API/SDK/Collector)
Instrumentation Model	Manual or via community libraries; B3 propagation	Manual or via client libraries; supports multiple propagators	Standardized API/SDK for manual and auto-instrumentation; defines OTLP
Data Collection Protocol	HTTP/JSON, Scribe (Thrift), Kafka	UDP/Thrift, HTTP/JSON, gRPC, Kafka	OTLP (gRPC/HTTP), also supports Zipkin & Jaeger protocols
Native Trace Context Propagation	B3 Propagation (X-B3-* headers)	B3, W3C Trace Context, Jaeger baggage	W3C Trace Context (reference implementation); supports B3 & Jaeger
Core Architecture	Collector, Storage, Query Service, UI	Agent, Collector, Query Service, UI, Ingester	API/SDK (per language), Collector (receivers/processors/exporters)
Default Storage Backend	In-memory, Cassandra, Elasticsearch, MySQL	In-memory, Cassandra, Elasticsearch, Kafka+ES	None (telemetry exporter); Collector supports many backends
Sampling Strategy	Delegated to client/tracer; collector can sample	Client-side (probabilistic, rate-limiting, remote), Tail sampling via collector	Head sampling in SDK; Tail sampling in Collector
Vendor Lock-in Risk	Low (focused on tracing, simple API)	Low (open-source, can export data)	Very Low (industry standard, decouples instrumentation from backend)
Integration with APM/Backends	Many backends support Zipkin format ingestion	Direct Jaeger backend or via compatible formats	Primary integration path for modern APM tools (via OTLP)
Deployment Model	Single binary or separate components	All-in-one binary or scalable microservice deployment	Library/SDK in app, Collector as sidecar/daemon/central service

ZIPKIN

Frequently Asked Questions

Zipkin is an open-source distributed tracing system essential for monitoring latency and dependencies in microservice and agentic architectures. These FAQs address its core mechanisms, integration, and role in modern observability.

Zipkin is an open-source distributed tracing system that collects and visualizes timing data for requests as they propagate through a distributed system. It works by instrumenting services to generate spans—timed records of individual operations—which are correlated using a unique trace ID to reconstruct the complete end-to-end path of a request. Spans are sent to a Zipkin backend, where they are stored, analyzed, and presented as flame graphs or dependency graphs to help engineers identify latency bottlenecks and service dependencies.

Key components include:

Instrumented Applications: Services generate trace data.
Zipkin Collector: Receives span data via HTTP or messaging queues.
Storage Backend: Supports databases like Elasticsearch or Cassandra.
Zipkin UI: A web interface for querying and visualizing traces.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

DISTRIBUTED TRACE COLLECTION

Related Terms

Zipkin operates within a broader ecosystem of distributed tracing concepts and tools. Understanding these related terms is essential for implementing effective observability.

OpenTelemetry (OTel)

OpenTelemetry (OTel) is a vendor-neutral, open-source observability framework that provides a unified set of APIs, SDKs, and tools for generating, collecting, and exporting telemetry data (traces, metrics, logs). It is the de facto standard for instrumentation, often serving as the data source for tracing backends like Zipkin. Key aspects include:

Instrumentation Libraries: Language-specific SDKs for manual and automatic code instrumentation.
OTLP Protocol: The gRPC/HTTP-based OpenTelemetry Protocol for transmitting data.
Collector: A vendor-agnostic proxy for receiving, processing, and exporting telemetry. While Zipkin is a specific tracing backend, OpenTelemetry is the instrumentation layer that can send data to it, among other destinations.

EXPLORE

Jaeger

Jaeger is an open-source, end-to-end distributed tracing system, originally developed by Uber, that serves as a direct alternative to Zipkin. Both systems solve similar problems but have distinct architectural and feature emphases.

Architecture: Jaeger typically uses a monolithic all-in-one binary for simpler deployments, whereas Zipkin's components (collector, storage, UI) are more decoupled.
Storage Backends: Jaeger has first-class support for Cassandra, Elasticsearch, and gRPC-plugin-based storage. Zipkin also supports multiple backends but started with a focus on simpler options.
Query Capabilities: Jaeger provides a powerful dependency graph and comparison views for traces. The choice between Zipkin and Jaeger often comes down to specific ecosystem fit, storage preferences, and operational complexity.

EXPLORE

Span & Trace

The span and the trace are the two fundamental data models in distributed tracing that Zipkin collects and visualizes.

Span: Represents a single, named, timed operation within a service (e.g., a database query, an HTTP handler). A span contains:
- Span ID: A unique identifier for this operation.
- Parent ID: A reference to the span that caused this work (creating a hierarchy).
- Timestamps: Start and duration.
- Tags/Annotations: Key-value metadata (e.g., http.method=GET).
Trace: A directed acyclic graph (DAG) of spans that represents the end-to-end journey of a request. All spans in a trace share a single, globally unique Trace ID. Zipkin's UI reconstructs these hierarchies to show the flame graph visualization of a request's path.

Context Propagation

Context propagation is the critical mechanism that enables distributed tracing by passing trace identifiers (Trace ID, Span ID) across service boundaries. Without it, spans cannot be linked into a coherent trace.

Propagators: Code components that inject context into outbound requests (e.g., HTTP headers) and extract it from inbound requests.
Formats: Zipkin originally popularized the B3 Propagation format, which uses headers like X-B3-TraceId. The W3C Trace Context standard is now widely adopted for interoperability.
Process: When Service A calls Service B, Service A's tracing SDK injects the current trace and span IDs into the HTTP headers. Service B's SDK extracts them, creating a new child span linked to the parent. This propagation is what allows Zipkin to track requests across network hops, queues, and asynchronous processes.

APM (Application Performance Monitoring)

Application Performance Monitoring (APM) is the overarching practice of monitoring software performance and availability. Distributed tracing, as implemented by Zipkin, is a core pillar of a modern APM strategy.

APM Pillars: Typically includes Distributed Tracing, Metrics (time-series data), and Logs (event records).
Zipkin's Role: Zipkin is a specialized tracing backend. Commercial and open-source APM Suites (e.g., Datadog APM, New Relic, Grafana Tempo) often bundle tracing with metrics, logging, and alerting into a unified platform.
Use Case: While Zipkin excels at deep-dive latency analysis for microservices, a full APM solution provides a broader view, correlating traces with system metrics (CPU, error rates) and business KPIs.

Service Dependency Graph

A Service Dependency Graph (or Service Graph) is a topological map automatically generated from trace data by systems like Zipkin. It visualizes the runtime dependencies between services in a microservice architecture.

Generation: Zipkin analyzes trace data to infer which services call which other services and the volume/latency of those calls.
Purpose: Provides an immediate, visual understanding of system architecture and critical data flows. It is essential for:
- Impact Analysis: Understanding which services will be affected by an outage.
- Architecture Drift: Identifying unintended or new dependencies.
- Bottleneck Identification: Spotting services with high fan-out or latency. This graph moves observability from analyzing single requests (traces) to understanding the holistic system behavior.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Zipkin

What is Zipkin?

Key Features of Zipkin

Span-Based Data Model

Trace Context Propagation

Multi-Component Architecture

Dependency Analysis & Service Graphs

Integration & Instrumentation Ecosystem

Sampling for Scalability

How Zipkin Works

Zipkin vs. Jaeger vs. OpenTelemetry

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

OpenTelemetry (OTel)

Jaeger

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there