Self-Play is a reinforcement learning training paradigm where an artificial intelligence agent learns by competing against iterative versions of itself, creating an automatic curriculum of progressively more challenging opponents. This bootstrapping process, central to algorithms like AlphaZero, allows the agent's policy to improve without external human data, as it discovers novel strategies through exploration. The paradigm is defined by a closed-loop where the learner's current policy is used to generate training data by playing against its immediate past versions or a historical pool of policies.
