Inferensys

Glossary

DataDog Agent

The Datadog Agent is a lightweight software package installed on hosts that collects events and metrics, forwards them to the Datadog platform, and executes checks for integrations and custom monitoring.
Developer reviewing multi-agent chat interface on laptop, agent conversation logs visible, casual coding session at WeWork desk.
AGENT TELEMETRY PIPELINES

What is DataDog Agent?

A core component of the Datadog observability platform responsible for collecting and forwarding telemetry data from hosts.

The Datadog Agent is a lightweight, open-source software daemon installed on monitored hosts that collects metrics, traces, and logs, executes integration checks, and forwards this telemetry data to the Datadog platform for analysis. It operates as the foundational data collection layer, enabling comprehensive observability by gathering system-level data (CPU, memory) and application-specific signals through its modular architecture and extensive library of out-of-the-box integrations.

The agent provides real-time data collection with minimal overhead, supports custom checks written in Python, and can be deployed across diverse environments including bare metal, virtual machines, containers, and Kubernetes via a DaemonSet. It manages data enrichment with tags, handles secure communication to Datadog's backend, and works in tandem with the Datadog Trace Agent and Process Agent for advanced APM and infrastructure monitoring, forming a unified pipeline for agentic observability.

DATADOG AGENT

Key Features and Capabilities

The Datadog Agent is a lightweight, open-source software package installed on hosts to collect observability data. It functions as the universal data collector for the Datadog platform, gathering metrics, traces, logs, and events.

01

Unified Data Collection

The Agent provides a single, integrated daemon for collecting all major telemetry signals. It gathers system metrics (CPU, memory, disk I/O), application performance monitoring (APM) traces via its tracing library, logs from files, network ports, or journald, and integration metrics from over 650 supported technologies. This unified collection eliminates the need for multiple, disparate agents, simplifying deployment and configuration management.

02

Out-of-the-Box Integrations

A core capability is its extensive library of built-in checks. These are pre-configured plugins that automatically collect metrics and service checks from popular technologies. Examples include:

  • Databases: PostgreSQL, Redis, MongoDB
  • Web Servers: Nginx, Apache
  • Cloud Platforms: AWS, Google Cloud, Azure
  • Container Orchestrators: Kubernetes, Docker Each check parses standard endpoints or APIs, transforming vendor-specific metrics into a normalized format for the Datadog platform.
03

Autodiscovery for Dynamic Environments

In containerized environments like Kubernetes, the Agent uses Autodiscovery to automatically identify services running in pods and apply the correct monitoring configuration. It monitors the container runtime or orchestrator API for new, updated, or terminated containers. When a new container starts, Autodiscovery matches its identifiers (e.g., image name, container labels, Kubernetes annotations) against check templates and dynamically enables the appropriate monitoring, ensuring no ephemeral workload goes unobserved.

04

Local Aggregation and Forwarding

The Agent performs critical preprocessing on the host before data transmission. It aggregates metrics at configurable intervals (default: 15 seconds), reducing the total number of data points. For traces and logs, it batches payloads and compresses them. It then forwards this processed data to Datadog's intake endpoints via a resilient, HTTPS-based protocol with automatic retry logic and local buffering to withstand network interruptions, ensuring at-least-once delivery of telemetry.

05

Live Processes & Continuous Profiling

Beyond standard metrics, the Agent can collect deep runtime diagnostics. The Live Processes feature discovers and monitors all processes running on a host, collecting their command, resource consumption, and lineage. The Continuous Profiler (an integrated component) samples CPU usage, memory allocation, and wall time at the method level for supported languages (Java, Python, Go, etc.), generating flame graphs that pinpoint code-level performance bottlenecks with minimal overhead (<2% typical).

06

Security & Network Monitoring

The Agent includes modules for security and network observability. The Security Agent performs runtime security detection, scanning for file integrity changes, suspicious process activity, and potential compliance violations based on rules. The Network Performance Monitoring (NPM) module uses eBPF (on Linux) to trace TCP/UDP communications between services at the kernel level, providing topology maps, connection latency, and throughput metrics without requiring application code changes or port mirroring.

AGENT TELEMETRY PIPELINES

How the Datadog Agent Works

The Datadog Agent is the core data collection engine for the Datadog observability platform, responsible for gathering, processing, and forwarding telemetry from hosts and containers.

The Datadog Agent is a lightweight, open-source software daemon installed on hosts that collects metrics, traces, logs, and events, forwarding them to the Datadog platform. It operates via a collection of checks—modular integrations—that gather data from specific systems like databases, web servers, or custom applications. The agent runs continuously, providing real-time visibility into infrastructure and application health without requiring constant manual intervention.

Architecturally, the agent consists of a core process and several forwarder and collector subsystems. It aggregates data locally, applies tagging for context, and batches payloads for efficient transmission over HTTPS or via a proxy. For containerized environments, it is typically deployed as a DaemonSet in Kubernetes, ensuring one instance per node. The agent also executes live processes and continuous profiling, enabling deep diagnostic capabilities directly from the monitored system.

DATADOG AGENT

Frequently Asked Questions

Essential questions about the Datadog Agent, the open-source software that collects observability data from your infrastructure and applications.

The Datadog Agent is a lightweight, open-source software package installed on hosts that collects observability data—metrics, traces, logs, and events—and forwards them to the Datadog platform. It operates as a persistent background service (daemon) that executes checks for integrations, runs custom scripts, and aggregates data. The agent uses a pull model for some integrations (querying APIs) and a push model for others (receiving stats from applications), ultimately batching and transmitting data securely to Datadog's intake endpoints via HTTPS or a proxy. Its modular architecture includes a core agent and optional components like the APM (Application Performance Monitoring) tracer and Process Agent.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.