Inverse Reinforcement Learning (IRL) is a machine learning technique for deducing the reward function that an agent is optimizing, given observations of its behavior or policy. Unlike standard reinforcement learning (RL), which seeks an optimal policy for a known reward, IRL solves the inverse problem: it infers the latent objectives that explain demonstrated behavior. This is foundational for preference modeling and learning human intent from demonstration data, such as in robotics or autonomous driving.
