RACA：多代理深入学习的临时合作的关系意识信用分配

论文标题

RACA：多代理深入学习的临时合作的关系意识信用分配

RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning

论文作者

Chen, Hao, Yang, Guangkai, Zhang, Junge, Yin, Qiyue, Huang, Kaiqi

论文摘要

近年来，强化学习在多代理领域（例如信用分配问题）面临着一些挑战。价值函数分解是一种有前途的方式，可以通过分散执行（CTDE）范式在集中式培训下处理信用分配问题。但是，现有的价值函数分解方法无法处理临时合作，也就是说，在测试时适应队友的新配置。具体而言，这些方法不能明确利用代理之间的关系，并且不能适应不同大小的输入。为了解决这些局限性，我们提出了一种新颖的方法，称为关系意识的信用分配（RACA），该方法在临时合作方案中实现了零击的概括。 RACA利用基于图的关系编码器来编码代理之间的拓扑结构。此外，RACA还利用了一种基于注意力的观察抽象机制，该机制可以推广到具有固定数量参数的任意数量的队友。实验表明，我们的方法在StarcraftII微管理基准和临时合作方案上的表现优于基线方法。

In recent years, reinforcement learning has faced several challenges in the multi-agent domain, such as the credit assignment issue. Value function factorization emerges as a promising way to handle the credit assignment issue under the centralized training with decentralized execution (CTDE) paradigm. However, existing value function factorization methods cannot deal with ad-hoc cooperation, that is, adapting to new configurations of teammates at test time. Specifically, these methods do not explicitly utilize the relationship between agents and cannot adapt to different sizes of inputs. To address these limitations, we propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based relation encoder to encode the topological structure between agents. Furthermore, RACA utilizes an attention-based observation abstraction mechanism that can generalize to an arbitrary number of teammates with a fixed number of parameters. Experiments demonstrate that our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题