Object-centric representation is a learning paradigm where a model decomposes a complex scene or input into a structured set of discrete entities or 'objects,' each with its own independent latent representation. This contrasts with monolithic, pixel-level representations by explicitly modeling compositionality—the idea that a whole is composed of reusable, interacting parts. This structured abstraction is fundamental for world model learning, enabling more efficient reasoning about object permanence, physical interactions, and long-horizon planning in dynamic environments.
