RLHF fails for physical tasks because its core premise—cheap, safe trial-and-error—is a fantasy in the real world. The prohibitive cost of failure with million-dollar equipment and the catastrophic risk of unsafe exploration make online RLHF a non-starter for industrial robotics.














