进化游戏理论平方：内源不断发展的零和游戏中发展的代理

论文标题

进化游戏理论平方：内源不断发展的零和游戏中发展的代理

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

论文作者

Skoulakis, Stratis, Fiez, Tanner, Sim, Ryann, Piliouras, Georgios, Ratliff, Lillian

论文摘要

进化游戏理论和游戏中更普遍的在线学习中的主要范式是基于在固定静态游戏中相互作用的动态代理人之间的明确区别。在本文中，我们摆脱了动态代理和静态游戏之间的人工鸿沟，介绍和分析了一大批竞争环境，在这些环境中，代理商和他们玩的游戏随着时间的推移而战略性地进化。我们可以说是最原始的游戏理论设置 - 零和网络概括）以及研究最多的进化学习动态 - 复制器，即乘法权重的连续时间模拟。代理商的种群在零和竞争中相互竞争，该竞争本身以对抗性为当前人口混合物。值得注意的是，尽管代理商和游戏的混乱共同进化，但我们证明该系统表现出许多规律性。首先，该系统具有信息理论风味的保护定律，使所有代理商和游戏的行为融为一体。其次，该系统是Poincaré经常性的，有效地，有所有可能的代理和游戏初始化，位于经常接近其初始条件的经常经常接近其初始条件上。第三，时间平均值的代理行为和实用程序会融合到时间平均游戏的NASH均衡值。最后，我们提供了一种多项式时间算法，以有效地预测任何此类协调网络游戏的时间平均值。

The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game. In this paper, we move away from the artificial divide between dynamic agents and static games, to introduce and analyze a large class of competitive settings where both the agents and the games they play evolve strategically over time. We focus on arguably the most archetypal game-theoretic setting -- zero-sum games (as well as network generalizations) -- and the most studied evolutionary learning dynamic -- replicator, the continuous-time analogue of multiplicative weights. Populations of agents compete against each other in a zero-sum competition that itself evolves adversarially to the current population mixture. Remarkably, despite the chaotic coevolution of agents and games, we prove that the system exhibits a number of regularities. First, the system has conservation laws of an information-theoretic flavor that couple the behavior of all agents and games. Secondly, the system is Poincaré recurrent, with effectively all possible initializations of agents and games lying on recurrent orbits that come arbitrarily close to their initial conditions infinitely often. Thirdly, the time-average agent behavior and utility converge to the Nash equilibrium values of the time-average game. Finally, we provide a polynomial time algorithm to efficiently predict this time-average behavior for any such coevolving network game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题