Invariant Risk Minimization (IRM) is a learning framework that aims to discover data representations for which the optimal predictor remains constant across multiple, distinct training environments. The core objective is to force a model to rely on causal features—those with a stable, mechanistic relationship to the target—while ignoring spurious, environment-specific correlations. This is formalized as a constrained optimization problem where the predictor is penalized for variance in performance across environments, promoting out-of-distribution (OOD) generalization.
