Glossary

ROS 2 Quality of Service (QoS)

ROS 2 Quality of Service (QoS) is a set of configurable policies that govern the behavior and guarantees of communication between nodes, allowing tuning for systems ranging from best-effort to strict real-time.

Get in touch Learn more

ML engineer tuning hyperparameters on laptop, optimization curves visible, technical experimentation session.

ROBOT OPERATING SYSTEM (ROS)

What is ROS 2 Quality of Service (QoS)?

ROS 2 Quality of Service (QoS) is a set of configurable policies that govern the behavior of communication between nodes, allowing tuning for systems ranging from best-effort to strict real-time.

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of asynchronous communication between nodes via topics, services, and actions. These policies, implemented by the underlying Data Distribution Service (DDS) middleware, allow developers to explicitly define the reliability, durability, and timing guarantees of data exchange. This enables fine-tuning for diverse robotic applications, from loss-tolerant sensor streams to mission-critical control commands requiring strict real-time delivery.

Key QoS policies include Reliability (guaranteed vs. best-effort delivery), Durability (transient-local vs. volatile history for late-joining subscribers), and Deadline (enforcing maximum latency between messages). Additional policies like Liveliness, Lifespan, and History Depth provide further control over system liveness, data freshness, and resource management. By selecting appropriate QoS profiles, system integrators can optimize network bandwidth, latency, and robustness, which is critical for the deterministic performance required in embodied intelligence systems.

ROS 2 COMMUNICATION TUNING

Core QoS Policies

ROS 2 Quality of Service (QoS) policies are configurable parameters that govern the behavior of communication between nodes. These policies allow developers to precisely tune the trade-offs between reliability, resource usage, and timeliness for different system components.

Reliability

The Reliability policy determines whether the middleware guarantees message delivery. It offers two settings:

RELIABLE: Ensures all messages are delivered, using retransmission protocols similar to TCP. This is critical for command and control or safety-critical data but adds latency and overhead.
BEST_EFFORT: Attempts to deliver messages but may drop them under network congestion or overload, similar to UDP. This is suitable for high-frequency sensor data (e.g., camera images) where occasional loss is acceptable to maintain low latency and high throughput. Mismatched reliability policies between a publisher and subscriber will prevent them from connecting.

Durability

The Durability policy controls whether messages are stored for late-joining subscribers. Its two modes are:

VOLATILE: Messages are not retained. A subscriber joining after a message is published will never receive it. This is the default and minimizes memory usage.
TRANSIENT_LOCAL: The publisher maintains a history of messages (depth configurable by the History policy). A new subscriber will receive the last N messages in the history upon connection. This is essential for state data, like a robot's current transform or operational mode, where a new node needs the latest known value immediately. For example, a transform broadcaster would use TRANSIENT_LOCAL durability so a newly launched navigation node instantly knows the robot's position.

Deadline

The Deadline policy defines the maximum expected period between successive messages on a topic. It is specified as a duration (e.g., 100ms).

The publisher promises to send messages at least this frequently.
The subscriber expects to receive messages at least this frequently. If the deadline is violated (a message is late), a callback is triggered in both the publisher and subscriber, allowing the system to take corrective action, such as logging a warning or entering a safe state. This policy is fundamental for building real-time systems where timing guarantees are required, enabling detection of stalled sensor streams or slow control loops.

Liveliness

The Liveliness policy provides a mechanism to detect if a publishing node has failed or become unresponsive. It has two components:

Liveliness Lease Duration: The maximum time a publisher can go without asserting its liveliness.
Liveliness Kind: AUTOMATIC (asserted by the library on publication) or MANUAL_BY_TOPIC (must be manually asserted by the node). If a publisher fails to assert its liveliness within the lease duration, it is considered "not alive," and all its connections are severed. Subscribers are notified of this change in liveliness status. This is crucial for system health monitoring, ensuring that a failed perception node does not cause a planner to operate on dangerously stale data.

History & Depth

The History policy dictates how messages are stored in the queue when the processing node cannot keep up. It works in tandem with a Depth value.

KEEP_LAST: Maintains a circular buffer of the last N messages (where N is the depth). When the queue is full, the oldest message is discarded to make room for the new one.
KEEP_ALL: Attempts to store all messages until they are processed. This can lead to unbounded memory growth if the subscriber is slower than the publisher. The Depth setting is critical for managing resource usage. For a high-frequency LiDAR topic, a small depth (e.g., 1-5) with KEEP_LAST ensures the system always processes the most recent scan, dropping older ones to prevent backlog and latency spikes.

Common QoS Profiles

ROS 2 provides predefined QoS Profiles that bundle policies for common use cases, simplifying configuration.

SENSOR_DATA: Uses BEST_EFFORT reliability, VOLATILE durability, and a small KEEP_LAST history depth. Optimized for high-bandwidth, loss-tolerant streams like camera images or point clouds.
SERVICES_DEFAULT: Uses RELIABLE reliability and VOLATILE durability. The standard profile for request-response services where every message must be delivered.
PARAMETER_EVENTS: Uses RELIABLE reliability and VOLATILE durability. Ensures parameter update notifications are reliably delivered.
SYSTEM_DEFAULT: The baseline profile, typically RELIABLE and VOLATILE. Developers can create custom profiles to match specific subsystem requirements, such as a CONTROL profile with strict deadlines and transient-local durability for actuator commands.

COMMUNICATION POLICY

How QoS Matching Works in ROS 2

QoS Matching is the dynamic negotiation process in ROS 2 that determines if two communicating nodes are compatible based on their configured Quality of Service policies.

QoS Matching is the decentralized, peer-to-peer negotiation that occurs when a ROS 2 publisher and subscriber attempt to establish a connection. Each side declares its required and offered QoS policies—such as reliability, durability, and deadline. The underlying Data Distribution Service (DDS) middleware performs a compatibility check for each policy. A connection is only formed if, for every policy, the offered quality from the publisher meets or exceeds the quality requested by the subscriber. This ensures communication semantics are predictable and system failures due to mismatched expectations are prevented at the network layer.

The matching logic is strict and policy-specific. For Reliability, a RELIABLE subscriber cannot connect to a BEST_EFFORT publisher, but the reverse is allowed. For Durability, a TRANSIENT_LOCAL subscriber requires a publisher with the same or higher durability to receive previously sent messages. Deadline and Liveliness policies are also matched, with violations triggering user-defined callback events. This fine-grained control allows developers to architect systems with mixed criticality, from best-effort logging to hard real-time control loops, within a single communication framework.

PRESET CONFIGURATIONS

Common QoS Profiles and Use Cases

This table compares the standard ROS 2 QoS profiles, detailing their policy configurations and the typical robotic system scenarios for which they are optimized.

QoS Profile	Reliability	Durability	Liveliness	Depth	Primary Use Case
SENSOR_DATA	BEST_EFFORT	VOLATILE	AUTOMATIC	10	High-frequency sensor streams (e.g., camera images, LiDAR scans) where occasional data loss is acceptable.
PARAMETERS	RELIABLE	VOLATILE	AUTOMATIC	1000	Dynamic parameter updates where the latest value must be received, but history is not required.
SERVICES_DEFAULT	RELIABLE	VOLATILE	AUTOMATIC	10	Synchronous service calls requiring guaranteed request/reply delivery.
SYSTEM_DEFAULT	RELIABLE	VOLATILE	AUTOMATIC	10	General-purpose intra-process communication where reliability is prioritized.
PARAMETER_EVENTS	RELIABLE	TRANSIENT_LOCAL	AUTOMATIC	1000	Parameter change notifications where late-joining nodes must receive the last update.
ACTIONS_DEFAULT	RELIABLE	VOLATILE	AUTOMATIC	10	Action servers and clients for long-running, interruptible tasks with feedback.
DEFAULT	RELIABLE	VOLATILE	AUTOMATIC	10	The fallback profile when no other is specified; balances reliability and performance.

ROS 2 QUALITY OF SERVICE (QOS)

Frequently Asked Questions

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of communication between nodes, allowing developers to tune data exchange for systems ranging from best-effort to strict real-time. This FAQ addresses common questions about its purpose, configuration, and practical use.

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of data exchange between nodes, allowing precise tuning for systems ranging from best-effort to strict real-time. It is critically important because robotic systems have heterogeneous communication needs; a sensor streaming high-frequency LiDAR data has different reliability and latency requirements than a sporadic logging service. QoS policies replace the one-size-fits-all TCP/UDP transport of ROS 1 with a declarative model, enabling deterministic performance, preventing data overload, and ensuring that critical control loops receive timely updates. By matching publisher and subscriber QoS profiles, developers guarantee that communication only occurs when both endpoints agree on the delivery semantics, preventing silent data loss or system hangs.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

ROS 2 COMMUNICATION CORE

Related Terms

ROS 2 Quality of Service (QoS) policies are part of a broader communication architecture. These related concepts define the fundamental patterns, data structures, and runtime mechanics that QoS policies govern.

ROS 2 DDS (Data Distribution Service)

ROS 2 DDS refers to the use of the Data Distribution Service (DDS) standard as the underlying middleware layer in ROS 2. It provides the decentralized discovery, configurable Quality of Service (QoS), and real-time data exchange capabilities that ROS 2 QoS policies directly configure. DDS is an Object Management Group (OMG) standard for scalable, real-time, data-centric publish-subscribe communication.

Decentralized Discovery: Nodes find each other automatically without a central broker.
Data-Centric Model: Communication is organized around "topics" as globally identifiable data.
QoS Implementation: The rich set of DDS QoS policies (e.g., reliability, durability, deadline) are exposed and managed through the ROS 2 Client Library APIs.

EXPLORE

ROS Topic

A ROS Topic is a named bus over which nodes exchange asynchronous, many-to-many messages using a publish-subscribe communication model. It is the primary channel where QoS policies are applied.

Publish-Subscribe Pattern: Publishers send messages to a topic; subscribers receive messages from that topic.
Asynchronous Communication: Publishers and subscribers operate independently.
QoS Binding: When a publisher or subscriber is created, it must specify a QoS profile. Communication only occurs between entities whose QoS profiles are compatible (e.g., a RELIABLE publisher cannot send data to a BEST_EFFORT subscriber if durability settings mismatch).

ROS 2 Executor

A ROS 2 Executor is a processing engine within a node that manages the execution of subscriptions, service servers, timers, and action servers. It works in tandem with QoS to manage how incoming data is processed.

Callback Scheduling: The executor 'spins', checking for new data on subscriptions and invoking the corresponding user-defined callback functions.
QoS Interaction: The executor's threading model (single-threaded, multi-threaded) affects how it handles messages, especially under deadline or liveliness QoS policies.
Backpressure Management: With a BEST_EFFORT reliability policy and a slow subscriber, the executor's queue can fill, leading to dropped messages. A RELIABLE policy, combined with proper executor tuning, ensures delivery but may increase latency.

ROS Message (.msg)

A ROS Message is a strictly typed data structure, defined in a .msg file, that is serialized and deserialized when publishing to topics or calling services. QoS policies govern the transmission of these serialized messages.

Strict Typing: Messages have defined fields (e.g., std_msgs/Header header, float64 data).
Serialization: Converted to a binary format for transmission over the network.
QoS Impact: Policies like deadline define the required frequency of message arrival. The size and complexity of the message type directly impact performance under policies like history (how many messages are kept) and reliability (how they are delivered).

ROS 2 Lifecycle Node

A ROS 2 Lifecycle Node is a managed node that follows a well-defined state machine (Unconfigured, Inactive, Active, Finalized). This management works alongside QoS to ensure predictable system startup and error recovery.

State Management: Transitions include configure, activate, deactivate, and cleanup.
QoS Coordination: Publishers/Subscribers within a lifecycle node are typically created during configure and activated during activate. This prevents a node from receiving data (governed by QoS) before it is fully initialized.
System Robustness: Combined with liveliness QoS, lifecycle states allow the system to detect when a critical component has failed and entered an error state, triggering safe shutdown or recovery procedures.

ROS 2 Domain ID

A ROS 2 Domain ID is a network segmentation integer (set via the ROS_DOMAIN_ID environment variable) that isolates ROS 2 graphs. It is a foundational network-level isolation mechanism, distinct from but complementary to QoS.

Network Segmentation: Nodes with different Domain IDs cannot discover or communicate with each other, even on the same physical network.
Use Case: Used to run multiple independent robot systems on the same network without interference.
Relation to QoS: While Domain ID provides hard isolation, QoS policies manage the quality of communication within a domain. For example, within domain 0, you can have a high-frequency, BEST_EFFORT sensor stream and a low-frequency, RELIABLE command stream coexisting.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.