Economics and Computation Series
Strategically Efficient Exploration for Multi-Agent Reinforcement Learning
1st December 2021, 13:00
Robert Loftin
TU Delft
Abstract
As a basis for exploration, the principle of optimism under uncertainty has lead to a number of important theoretical and empirical results in sample efficient reinforcement learning. In this talk, we discuss the role of optimistic exploration in multi-agent reinforcement learning and address potential issues that arise when applying optimism to RL in zero-sum games. We show that the direct application of optimism can lead to highly inefficient exploration in such games, where "cooperative" exploration focuses on outcomes that are unrealistic in "competitive" play. We then introduce a notion of "strategically efficient" exploration and demonstrate theoretically and empirically that strategically efficient learning algorithms can significantly outperform their optimistic counterparts, while retaining the same worst-case sample complexity guarantees.
Additional Materials
Maintained by Nicos Protopapas