In model-based reinforcement learning (MBRL), an agent learns a dynamics model to predict environmental transitions. The agent's policy is then optimized to maximize reward within this simulated model. Co-adaptation occurs when this policy exploits the model's idiosyncratic errors, learning behaviors that are highly effective in the flawed simulation but fail catastrophically in the real environment. This creates a deceptive feedback loop of increasing policy specialization to an inaccurate world.
