Camera-only perception is fundamentally incomplete. It provides rich 2D texture and color data but fails to capture the 3D geometry, material properties, and physical dynamics essential for a machine to interact with the world. This creates a brittle perception system that breaks under occlusion, poor lighting, or featureless surfaces.














