通过增强学习和基于图的山雀 - tat来解决不对称和循环顺序的社交困境

论文标题

通过增强学习和基于图的山雀 - tat来解决不对称和循环顺序的社交困境

Tackling Asymmetric and Circular Sequential Social Dilemmas with Reinforcement Learning and Graph-based Tit-for-Tat

论文作者

Gléau, Tangui Le, Marjou, Xavier, Lemlouma, Tayeb, Radier, Benoit

论文摘要

在许多社会和工业互动中，参与者通常更喜欢纯粹的自身利益，而以全球福利为代价。这类非合作游戏被称为社会困境，提供了多个演员都应合作以取得最佳结果的情况，但贪婪和恐惧导致了最坏的自我利益问题。最近，深入强化学习（RL）的出现通过引入顺序社会困境（SSD）引起了人们对社会困境的复兴兴趣。混合RL政策和TIT-TAT（TFT）策略的合作社成功解决了一些非最佳的NASH平衡问题。但是，这种范式需要参与者之间的对称和直接合作，当相互合作变得不对称时，没有满足的条件，并且只有在圆形的方式中至少有第三个参与者才有可能。为了解决这个问题，本文通过循环顺序社会困境（CSSD）扩展了SSD，这是一种新型的马尔可夫游戏，可以更好地概括代理商之间的合作多样性。其次，为了解决这种循环和不对称的合作，我们提出了基于RL策略和基于图的TFT的候选解决方案。我们在简单的多玩家网格世界上进行了一些实验，该网格世界提供了适应性的合作结构。我们的工作证实，基于图形的方法通过鼓励自我利益药物进行相互合作来解决循环情况。

In many societal and industrial interactions, participants generally prefer their pure self-interest at the expense of the global welfare. Known as social dilemmas, this category of non-cooperative games offers situations where multiple actors should all cooperate to achieve the best outcome but greed and fear lead to a worst self-interested issue. Recently, the emergence of Deep Reinforcement Learning (RL) has generated revived interest in social dilemmas with the introduction of Sequential Social Dilemma (SSD). Cooperative agents mixing RL policies and Tit-for-tat (TFT) strategies have successfully addressed some non-optimal Nash equilibrium issues. However, this kind of paradigm requires symmetrical and direct cooperation between actors, conditions that are not met when mutual cooperation become asymmetric and is possible only with at least a third actor in a circular way. To tackle this issue, this paper extends SSD with Circular Sequential Social Dilemma (CSSD), a new kind of Markov games that better generalizes the diversity of cooperation between agents. Secondly, to address such circular and asymmetric cooperation, we propose a candidate solution based on RL policies and a graph-based TFT. We conducted some experiments on a simple multi-player grid world which offers adaptable cooperation structures. Our work confirmed that our graph-based approach is beneficial to address circular situations by encouraging self-interested agents to reach mutual cooperation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题