非线性协调图

论文标题

非线性协调图

Non-Linear Coordination Graphs

论文作者

Kang, Yipeng, Wang, Tonghan, Wu, Xiaoran, Yang, Qianlan, Zhang, Chongjie

论文摘要

价值分解多代理增强学习方法学习全局价值函数作为每个代理单个实用程序功能的混合。协调图（CGS）通过合并成对收益函数来代表高阶分解，因此应该具有更强大的表示能力。但是，CGS在本地值函数上线性分解了全局值函数，从而严重限制了可以表示的值函数类的复杂性。在本文中，我们通过将CG值分解扩展到线性情况下，提出了第一个非线性协调图。一个主要的挑战是在此新功能类别中进行贪婪的动作选择，而通常采用的DCOP算法不再适用。我们研究如何使用LeakyyRelu激活网络混合网络时如何解决此问题。提出了具有全球最佳保证的枚举方法，并通过局部优化保证激励有效的迭代优化方法。我们发现我们的方法可以在挑战MACO等挑战多代理协调任务上实现卓越的性能。

Value decomposition multi-agent reinforcement learning methods learn the global value function as a mixing of each agent's individual utility functions. Coordination graphs (CGs) represent a higher-order decomposition by incorporating pairwise payoff functions and thus is supposed to have a more powerful representational capacity. However, CGs decompose the global value function linearly over local value functions, severely limiting the complexity of the value function class that can be represented. In this paper, we propose the first non-linear coordination graph by extending CG value decomposition beyond the linear case. One major challenge is to conduct greedy action selections in this new function class to which commonly adopted DCOP algorithms are no longer applicable. We study how to solve this problem when mixing networks with LeakyReLU activation are used. An enumeration method with a global optimality guarantee is proposed and motivates an efficient iterative optimization method with a local optimality guarantee. We find that our method can achieve superior performance on challenging multi-agent coordination tasks like MACO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题