Inferensys

Glossary

ROS 2 Quality of Service (QoS)

ROS 2 Quality of Service (QoS) is a set of configurable policies that govern the behavior and guarantees of communication between nodes, allowing tuning for systems ranging from best-effort to strict real-time.
ML engineer tuning hyperparameters on laptop, optimization curves visible, technical experimentation session.
ROBOT OPERATING SYSTEM (ROS)

What is ROS 2 Quality of Service (QoS)?

ROS 2 Quality of Service (QoS) is a set of configurable policies that govern the behavior of communication between nodes, allowing tuning for systems ranging from best-effort to strict real-time.

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of asynchronous communication between nodes via topics, services, and actions. These policies, implemented by the underlying Data Distribution Service (DDS) middleware, allow developers to explicitly define the reliability, durability, and timing guarantees of data exchange. This enables fine-tuning for diverse robotic applications, from loss-tolerant sensor streams to mission-critical control commands requiring strict real-time delivery.

Key QoS policies include Reliability (guaranteed vs. best-effort delivery), Durability (transient-local vs. volatile history for late-joining subscribers), and Deadline (enforcing maximum latency between messages). Additional policies like Liveliness, Lifespan, and History Depth provide further control over system liveness, data freshness, and resource management. By selecting appropriate QoS profiles, system integrators can optimize network bandwidth, latency, and robustness, which is critical for the deterministic performance required in embodied intelligence systems.

ROS 2 COMMUNICATION TUNING

Core QoS Policies

ROS 2 Quality of Service (QoS) policies are configurable parameters that govern the behavior of communication between nodes. These policies allow developers to precisely tune the trade-offs between reliability, resource usage, and timeliness for different system components.

01

Reliability

The Reliability policy determines whether the middleware guarantees message delivery. It offers two settings:

  • RELIABLE: Ensures all messages are delivered, using retransmission protocols similar to TCP. This is critical for command and control or safety-critical data but adds latency and overhead.
  • BEST_EFFORT: Attempts to deliver messages but may drop them under network congestion or overload, similar to UDP. This is suitable for high-frequency sensor data (e.g., camera images) where occasional loss is acceptable to maintain low latency and high throughput. Mismatched reliability policies between a publisher and subscriber will prevent them from connecting.
02

Durability

The Durability policy controls whether messages are stored for late-joining subscribers. Its two modes are:

  • VOLATILE: Messages are not retained. A subscriber joining after a message is published will never receive it. This is the default and minimizes memory usage.
  • TRANSIENT_LOCAL: The publisher maintains a history of messages (depth configurable by the History policy). A new subscriber will receive the last N messages in the history upon connection. This is essential for state data, like a robot's current transform or operational mode, where a new node needs the latest known value immediately. For example, a transform broadcaster would use TRANSIENT_LOCAL durability so a newly launched navigation node instantly knows the robot's position.
03

Deadline

The Deadline policy defines the maximum expected period between successive messages on a topic. It is specified as a duration (e.g., 100ms).

  • The publisher promises to send messages at least this frequently.
  • The subscriber expects to receive messages at least this frequently. If the deadline is violated (a message is late), a callback is triggered in both the publisher and subscriber, allowing the system to take corrective action, such as logging a warning or entering a safe state. This policy is fundamental for building real-time systems where timing guarantees are required, enabling detection of stalled sensor streams or slow control loops.
04

Liveliness

The Liveliness policy provides a mechanism to detect if a publishing node has failed or become unresponsive. It has two components:

  • Liveliness Lease Duration: The maximum time a publisher can go without asserting its liveliness.
  • Liveliness Kind: AUTOMATIC (asserted by the library on publication) or MANUAL_BY_TOPIC (must be manually asserted by the node). If a publisher fails to assert its liveliness within the lease duration, it is considered "not alive," and all its connections are severed. Subscribers are notified of this change in liveliness status. This is crucial for system health monitoring, ensuring that a failed perception node does not cause a planner to operate on dangerously stale data.
05

History & Depth

The History policy dictates how messages are stored in the queue when the processing node cannot keep up. It works in tandem with a Depth value.

  • KEEP_LAST: Maintains a circular buffer of the last N messages (where N is the depth). When the queue is full, the oldest message is discarded to make room for the new one.
  • KEEP_ALL: Attempts to store all messages until they are processed. This can lead to unbounded memory growth if the subscriber is slower than the publisher. The Depth setting is critical for managing resource usage. For a high-frequency LiDAR topic, a small depth (e.g., 1-5) with KEEP_LAST ensures the system always processes the most recent scan, dropping older ones to prevent backlog and latency spikes.
06

Common QoS Profiles

ROS 2 provides predefined QoS Profiles that bundle policies for common use cases, simplifying configuration.

  • SENSOR_DATA: Uses BEST_EFFORT reliability, VOLATILE durability, and a small KEEP_LAST history depth. Optimized for high-bandwidth, loss-tolerant streams like camera images or point clouds.
  • SERVICES_DEFAULT: Uses RELIABLE reliability and VOLATILE durability. The standard profile for request-response services where every message must be delivered.
  • PARAMETER_EVENTS: Uses RELIABLE reliability and VOLATILE durability. Ensures parameter update notifications are reliably delivered.
  • SYSTEM_DEFAULT: The baseline profile, typically RELIABLE and VOLATILE. Developers can create custom profiles to match specific subsystem requirements, such as a CONTROL profile with strict deadlines and transient-local durability for actuator commands.
COMMUNICATION POLICY

How QoS Matching Works in ROS 2

QoS Matching is the dynamic negotiation process in ROS 2 that determines if two communicating nodes are compatible based on their configured Quality of Service policies.

QoS Matching is the decentralized, peer-to-peer negotiation that occurs when a ROS 2 publisher and subscriber attempt to establish a connection. Each side declares its required and offered QoS policies—such as reliability, durability, and deadline. The underlying Data Distribution Service (DDS) middleware performs a compatibility check for each policy. A connection is only formed if, for every policy, the offered quality from the publisher meets or exceeds the quality requested by the subscriber. This ensures communication semantics are predictable and system failures due to mismatched expectations are prevented at the network layer.

The matching logic is strict and policy-specific. For Reliability, a RELIABLE subscriber cannot connect to a BEST_EFFORT publisher, but the reverse is allowed. For Durability, a TRANSIENT_LOCAL subscriber requires a publisher with the same or higher durability to receive previously sent messages. Deadline and Liveliness policies are also matched, with violations triggering user-defined callback events. This fine-grained control allows developers to architect systems with mixed criticality, from best-effort logging to hard real-time control loops, within a single communication framework.

PRESET CONFIGURATIONS

Common QoS Profiles and Use Cases

This table compares the standard ROS 2 QoS profiles, detailing their policy configurations and the typical robotic system scenarios for which they are optimized.

QoS ProfileReliabilityDurabilityDeadlineLivelinessDepthPrimary Use Case

SENSOR_DATA

BEST_EFFORT

VOLATILE

AUTOMATIC

10

High-frequency sensor streams (e.g., camera images, LiDAR scans) where occasional data loss is acceptable.

PARAMETERS

RELIABLE

VOLATILE

AUTOMATIC

1000

Dynamic parameter updates where the latest value must be received, but history is not required.

SERVICES_DEFAULT

RELIABLE

VOLATILE

AUTOMATIC

10

Synchronous service calls requiring guaranteed request/reply delivery.

SYSTEM_DEFAULT

RELIABLE

VOLATILE

AUTOMATIC

10

General-purpose intra-process communication where reliability is prioritized.

PARAMETER_EVENTS

RELIABLE

TRANSIENT_LOCAL

AUTOMATIC

1000

Parameter change notifications where late-joining nodes must receive the last update.

ACTIONS_DEFAULT

RELIABLE

VOLATILE

AUTOMATIC

10

Action servers and clients for long-running, interruptible tasks with feedback.

DEFAULT

RELIABLE

VOLATILE

AUTOMATIC

10

The fallback profile when no other is specified; balances reliability and performance.

ROS 2 QUALITY OF SERVICE (QOS)

Frequently Asked Questions

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of communication between nodes, allowing developers to tune data exchange for systems ranging from best-effort to strict real-time. This FAQ addresses common questions about its purpose, configuration, and practical use.

ROS 2 Quality of Service (QoS) is a configurable set of policies that govern the behavior of data exchange between nodes, allowing precise tuning for systems ranging from best-effort to strict real-time. It is critically important because robotic systems have heterogeneous communication needs; a sensor streaming high-frequency LiDAR data has different reliability and latency requirements than a sporadic logging service. QoS policies replace the one-size-fits-all TCP/UDP transport of ROS 1 with a declarative model, enabling deterministic performance, preventing data overload, and ensuring that critical control loops receive timely updates. By matching publisher and subscriber QoS profiles, developers guarantee that communication only occurs when both endpoints agree on the delivery semantics, preventing silent data loss or system hangs.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.