通过最佳和更好的回应，在两人对称游戏中的评估和学习

论文标题

通过最佳和更好的回应，在两人对称游戏中的评估和学习

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

论文作者

Yan, Rui, Zhang, Weixian, Deng, Ruiliang, Duan, Xiaoming, Shi, Zongying, Zhong, Yisheng

论文摘要

人工智能和机器人比赛伴随着一类游戏范式，每个玩家都会将策略私有化为游戏系统，该策略使用收集的联合策略模拟游戏，然后将收益返还给玩家。本文考虑了两人对称游戏的策略承诺，其中玩家的策略空间相同，他们的回报是对称的。首先，我们在一个基于水槽平衡上的两种试剂增强学习中的元级别介绍了两个基于Digraph的指标，以进行策略评估。指标对单人的策略进行排名，并确定私人承诺首选的策略集。然后，为了找到指标下的首选策略，我们提出了经典学习算法自我播放的两个变体，称为严格最佳响应和弱反应的自我播放。通过对学习过程进行建模作为在联合策略响应挖掘方面的步行，我们证明，在两个指标下，通过两个变体的学习策略是首选的。确定了两个指标下的首选策略，并连接一个由一个度量标准和一个变体诱导的邻接矩阵。最后，提供模拟以说明结果。

Artificial intelligence and robotic competitions are accompanied by a class of game paradigms in which each player privately commits a strategy to a game system which simulates the game using the collected joint strategy and then returns payoffs to players. This paper considers the strategy commitment for two-player symmetric games in which the players' strategy spaces are identical and their payoffs are symmetric. First, we introduce two digraph-based metrics at a meta-level for strategy evaluation in two-agent reinforcement learning, grounded on sink equilibrium. The metrics rank the strategies of a single player and determine the set of strategies which are preferred for the private commitment. Then, in order to find the preferred strategies under the metrics, we propose two variants of the classical learning algorithm self-play, called strictly best-response and weakly better-response self-plays. By modeling learning processes as walks over joint-strategy response digraphs, we prove that the learnt strategies by two variants are preferred under two metrics, respectively. The preferred strategies under both two metrics are identified and adjacency matrices induced by one metric and one variant are connected. Finally, simulations are provided to illustrate the results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题