General-purpose vision models like those trained on the COCO or ImageNet datasets fail on construction debris because they are optimized for recognizing discrete, well-defined objects in curated photos, not the amorphous, overlapping piles of material found on a messy site. The semantic gap between a labeled 'sofa' and an unlabeled heap of rebar, concrete, and wood is a fundamental limitation of their training data distribution.














