Joint attention is the coordinated, triadic focus of two or more agents on a single object or event, facilitated by gestural, verbal, or gaze-based cues to establish a shared frame of reference. In artificial intelligence and multi-agent systems, it is a critical mechanism for enabling collaborative task execution and efficient communication, as it allows agents to align their perceptual states and infer shared goals without explicit instruction. This process is foundational for social learning and underpins more complex Theory of Mind capabilities.
