The Datadog Agent is a lightweight, open-source software daemon installed on monitored hosts that collects metrics, traces, and logs, executes integration checks, and forwards this telemetry data to the Datadog platform for analysis. It operates as the foundational data collection layer, enabling comprehensive observability by gathering system-level data (CPU, memory) and application-specific signals through its modular architecture and extensive library of out-of-the-box integrations.
Glossary
DataDog Agent

What is DataDog Agent?
A core component of the Datadog observability platform responsible for collecting and forwarding telemetry data from hosts.
The agent provides real-time data collection with minimal overhead, supports custom checks written in Python, and can be deployed across diverse environments including bare metal, virtual machines, containers, and Kubernetes via a DaemonSet. It manages data enrichment with tags, handles secure communication to Datadog's backend, and works in tandem with the Datadog Trace Agent and Process Agent for advanced APM and infrastructure monitoring, forming a unified pipeline for agentic observability.
Key Features and Capabilities
The Datadog Agent is a lightweight, open-source software package installed on hosts to collect observability data. It functions as the universal data collector for the Datadog platform, gathering metrics, traces, logs, and events.
Unified Data Collection
The Agent provides a single, integrated daemon for collecting all major telemetry signals. It gathers system metrics (CPU, memory, disk I/O), application performance monitoring (APM) traces via its tracing library, logs from files, network ports, or journald, and integration metrics from over 650 supported technologies. This unified collection eliminates the need for multiple, disparate agents, simplifying deployment and configuration management.
Out-of-the-Box Integrations
A core capability is its extensive library of built-in checks. These are pre-configured plugins that automatically collect metrics and service checks from popular technologies. Examples include:
- Databases: PostgreSQL, Redis, MongoDB
- Web Servers: Nginx, Apache
- Cloud Platforms: AWS, Google Cloud, Azure
- Container Orchestrators: Kubernetes, Docker Each check parses standard endpoints or APIs, transforming vendor-specific metrics into a normalized format for the Datadog platform.
Autodiscovery for Dynamic Environments
In containerized environments like Kubernetes, the Agent uses Autodiscovery to automatically identify services running in pods and apply the correct monitoring configuration. It monitors the container runtime or orchestrator API for new, updated, or terminated containers. When a new container starts, Autodiscovery matches its identifiers (e.g., image name, container labels, Kubernetes annotations) against check templates and dynamically enables the appropriate monitoring, ensuring no ephemeral workload goes unobserved.
Local Aggregation and Forwarding
The Agent performs critical preprocessing on the host before data transmission. It aggregates metrics at configurable intervals (default: 15 seconds), reducing the total number of data points. For traces and logs, it batches payloads and compresses them. It then forwards this processed data to Datadog's intake endpoints via a resilient, HTTPS-based protocol with automatic retry logic and local buffering to withstand network interruptions, ensuring at-least-once delivery of telemetry.
Live Processes & Continuous Profiling
Beyond standard metrics, the Agent can collect deep runtime diagnostics. The Live Processes feature discovers and monitors all processes running on a host, collecting their command, resource consumption, and lineage. The Continuous Profiler (an integrated component) samples CPU usage, memory allocation, and wall time at the method level for supported languages (Java, Python, Go, etc.), generating flame graphs that pinpoint code-level performance bottlenecks with minimal overhead (<2% typical).
Security & Network Monitoring
The Agent includes modules for security and network observability. The Security Agent performs runtime security detection, scanning for file integrity changes, suspicious process activity, and potential compliance violations based on rules. The Network Performance Monitoring (NPM) module uses eBPF (on Linux) to trace TCP/UDP communications between services at the kernel level, providing topology maps, connection latency, and throughput metrics without requiring application code changes or port mirroring.
How the Datadog Agent Works
The Datadog Agent is the core data collection engine for the Datadog observability platform, responsible for gathering, processing, and forwarding telemetry from hosts and containers.
The Datadog Agent is a lightweight, open-source software daemon installed on hosts that collects metrics, traces, logs, and events, forwarding them to the Datadog platform. It operates via a collection of checks—modular integrations—that gather data from specific systems like databases, web servers, or custom applications. The agent runs continuously, providing real-time visibility into infrastructure and application health without requiring constant manual intervention.
Architecturally, the agent consists of a core process and several forwarder and collector subsystems. It aggregates data locally, applies tagging for context, and batches payloads for efficient transmission over HTTPS or via a proxy. For containerized environments, it is typically deployed as a DaemonSet in Kubernetes, ensuring one instance per node. The agent also executes live processes and continuous profiling, enabling deep diagnostic capabilities directly from the monitored system.
Frequently Asked Questions
Essential questions about the Datadog Agent, the open-source software that collects observability data from your infrastructure and applications.
The Datadog Agent is a lightweight, open-source software package installed on hosts that collects observability data—metrics, traces, logs, and events—and forwards them to the Datadog platform. It operates as a persistent background service (daemon) that executes checks for integrations, runs custom scripts, and aggregates data. The agent uses a pull model for some integrations (querying APIs) and a push model for others (receiving stats from applications), ultimately batching and transmitting data securely to Datadog's intake endpoints via HTTPS or a proxy. Its modular architecture includes a core agent and optional components like the APM (Application Performance Monitoring) tracer and Process Agent.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
The Datadog Agent operates within a broader ecosystem of telemetry collection and processing. These related concepts define the tools, patterns, and protocols that enable comprehensive observability.
Distributed Tracing
A method of profiling requests as they flow through a distributed system, tracking the full path, latency, and relationships between operations. It is composed of spans, which are individual timed operations. The Datadog Agent can collect trace data from instrumented applications. Critical concepts include:
- Trace Context: Metadata (e.g., trace ID, span ID) propagated across service boundaries.
- Sampling Strategies: Rules to reduce data volume, such as head-based (decision at start) or tail-based (decision after request completion) sampling.
- Enables performance debugging across microservices and serverless functions.
Sidecar Pattern & DaemonSet
Deployment models for auxiliary software like observability agents. The Datadog Agent can be deployed using both patterns:
- Sidecar Pattern: The agent runs in a separate container alongside the main application container in a Kubernetes pod. This provides isolation and allows the agent to be updated independently.
- DaemonSet: A Kubernetes controller that ensures a copy of the Datadog Agent pod runs on every node (or a subset) in the cluster. This is efficient for collecting host-level metrics (CPU, memory, disk) from all nodes.
Telemetry Data Pipeline
The end-to-end system for moving observability data from sources to backends. The Datadog Agent is a source collector within this pipeline. Related components include:
- Data Enrichment: Adding context (tags, environment) to raw metrics and traces.
- Backpressure Handling: Managing flow when the backend is slower than the data source.
- Dead Letter Queue (DLQ): A holding area for events that fail repeated processing attempts.
- Alternative pipeline tools include Vector.dev (high-performance router), Fluentd (unified logging), and the OTel Collector.
Metric Collection & Export
The process of gathering quantitative measurements about system behavior. The Datadog Agent collects host metrics and application metrics via integrations and DogStatsD. Related protocols and systems:
- StatsD: A simple UDP-based protocol for sending application metrics (counters, timers, gauges). Datadog's DogStatsD is an extension.
- Prometheus: An open-source monitoring system using a pull model over HTTP. The Datadog Agent can be configured as a Prometheus exporter or scrape Prometheus endpoints.
- Metric Exporter: The component within an SDK that sends aggregated metrics to a backend.
Continuous Profiling & eBPF
Advanced observability techniques that provide deep system insights with low overhead.
- Continuous Profiling: Automatically collecting application performance profiles (CPU, memory, I/O) over time. Tools like Pyroscope offer this; Datadog provides Continuous Profiler as a feature.
- eBPF Tracing: A Linux kernel technology that allows safe program execution in the kernel for deep observability. It can trace system calls, network traffic, and kernel functions. The Datadog Agent uses eBPF for network performance monitoring, security, and kernel-level profiling without requiring application changes.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us