Multi-Agent Reinforcement Learning (MARL) is a subfield of machine learning where multiple autonomous agents learn to make sequential decisions by interacting with a shared environment and each other to maximize cumulative reward. Unlike single-agent RL, MARL agents must account for the presence and actions of other learning entities, which makes the environment non-stationary from any single agent's perspective. The core mechanism involves each agent observing a (potentially partial) state of the environment, selecting an action based on its policy, and receiving a reward that depends on the joint action of all agents. Over time, through algorithms like Multi-Agent Deep Deterministic Policy Gradient (MADDPG) or Q-MIX, agents learn policies—either independently, cooperatively, or competitively—that define how to act in this complex, interactive setting. The fundamental challenge is the moving target problem, where an agent's optimal policy shifts as other agents learn, requiring specialized stability and convergence guarantees.