Soft Actor-Critic (SAC) is an off-policy, actor-critic reinforcement learning algorithm that maximizes a trade-off between expected reward and policy entropy. This maximum entropy RL framework encourages the policy to explore more broadly by rewarding stochasticity, leading to improved robustness and sample efficiency, particularly in continuous action spaces common in robotics. It stabilizes training using techniques like a replay buffer, separate target networks, and automatic temperature adjustment.




