Sample efficiency quantifies the number of real-world interactions an agent requires to learn a high-performing policy. A highly sample-efficient algorithm minimizes expensive, risky, or time-consuming interactions with the actual environment, which is a primary claimed advantage of model-based reinforcement learning (MBRL) over model-free methods. It is formally measured by the learning curve plotting performance against the number of environment steps or episodes.
