QGNN：图形神经网络的价值函数分解

论文标题

QGNN：图形神经网络的价值函数分解

QGNN: Value Function Factorisation with Graph Neural Networks

论文作者

Kortvelesy, Ryan, Prorok, Amanda

论文摘要

在多机构强化学习中，使用全球目标是激励合作的有力工具。不幸的是，以全球奖励培训个别代理并不是样本效率，因为它不一定与代理人的个人行为相关。可以通过将全局值函数分解到本地值函数中来解决此问题。该域中的早期工作通过调节本地价值纯粹在本地信息上进行了分解。最近，已经表明，提供本地信息和全球状态的编码可以促进合作行为。在本文中，我们提出了QGNN，这是使用基于图神经网络（GNN）模型的第一种值分解方法。 QGNN的多层消息传递体系结构比先前的工作中的模型提供了更多的代表性复杂性，从而使其能够产生更有效的分解。 QGNN还引入了置换不变的混合器，即使参数较少，它也能够匹配其他方法的性能。我们对几个基线的方法评估了我们的方法，包括QMIX-ATT，GraphMix，QMIX，VDN和Hybrid Architectures。我们的实验包括Starcraft，这是信用分配的标准基准；估计游戏，一种自定义环境，可以明确建模代理依赖关系；和联盟结构生成，这是现实世界应用的基础问题。结果表明，QGNN始终超过最先进的价值分解基线。

In multi-agent reinforcement learning, the use of a global objective is a powerful tool for incentivising cooperation. Unfortunately, it is not sample-efficient to train individual agents with a global reward, because it does not necessarily correlate with an agent's individual actions. This problem can be solved by factorising the global value function into local value functions. Early work in this domain performed factorisation by conditioning local value functions purely on local information. Recently, it has been shown that providing both local information and an encoding of the global state can promote cooperative behaviour. In this paper we propose QGNN, the first value factorisation method to use a graph neural network (GNN) based model. The multi-layer message passing architecture of QGNN provides more representational complexity than models in prior work, allowing it to produce a more effective factorisation. QGNN also introduces a permutation invariant mixer which is able to match the performance of other methods, even with significantly fewer parameters. We evaluate our method against several baselines, including QMIX-Att, GraphMIX, QMIX, VDN, and hybrid architectures. Our experiments include Starcraft, the standard benchmark for credit assignment; Estimate Game, a custom environment that explicitly models inter-agent dependencies; and Coalition Structure Generation, a foundational problem with real-world applications. The results show that QGNN outperforms state-of-the-art value factorisation baselines consistently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题