Deep Deterministic Policy Gradient (DDPG) is an off-policy, actor-critic reinforcement learning algorithm designed for environments with continuous action spaces. It combines a deterministic policy gradient with key innovations from Deep Q-Networks (DQN), including a replay buffer and target networks, to enable stable learning from high-dimensional sensory inputs. The algorithm learns a deterministic policy (the actor) that maps states directly to actions and a Q-function (the critic) that evaluates the quality of those actions.




