Thompson Sampling is a Bayesian algorithm for solving the exploration-exploitation trade-off in sequential decision problems, such as the multi-armed bandit, where actions are selected by sampling from the posterior probability distribution over which action is optimal and then updating beliefs based on observed rewards. This elegant approach naturally balances trying uncertain actions (exploration) with favoring actions believed to be best (exploitation) by treating the problem as one of Bayesian inference. It is mathematically equivalent to probability matching.
