Actor-critic methods are a hybrid reinforcement learning architecture that combines a policy function (the actor), which selects actions, with a value function (the critic), which evaluates those actions, enabling more stable and lower-variance learning than pure policy gradient methods. The actor proposes actions based on the current policy, while the critic provides a temporal-difference error signal, assessing whether the chosen action was better or worse than expected, which is used to update both networks.
