多机构学习中的探索解释：灾难理论符合游戏理论

论文标题

多机构学习中的探索解释：灾难理论符合游戏理论

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

论文作者

Leonardos, Stefanos, Piliouras, Georgios

论文摘要

探索 - 探索是多门学习（MAL）的强大且实用的工具，但是，其效果远非理解。为了朝这个方向取得进展，我们研究了Q学习的平滑类似物。首先，我们表明我们的学习模型具有强大的理论理由，作为研究探索探索探索的最佳模型。具体而言，我们证明，平滑的Q学习在任意游戏中的遗憾有限，该模型明确捕获了游戏和勘探成本之间的平衡，并且它始终融合到一组数量响应平衡（QRE）的集合，这是游戏中的标准解决方案概念，在有限的合理性游戏中，在与异型学习的加权游戏中，具有分类的潜在游戏中的潜在游戏中的概念。在我们的主要任务中，我们转向衡量探索在集体系统性能中的影响。我们表征了低维MAL系统中QRE表面的几何形状，并将我们的发现与灾难（分叉）理论联系起来。特别是，随着勘探超参数随着时间的推移而发展，系统经历了相变的，其中平衡的数量和稳定性可能会发生根本变化，因为鉴于探索参数的无限变化。基于此，我们提供了一种正式的理论处理，以证明如何调整勘探参数可以导致均衡选择，并以正面和负面（以及潜在的无限）对系统性能的影响。

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题