多代理系统中对手建模的变异自动编码器

论文标题

多代理系统中对手建模的变异自动编码器

Variational Autoencoders for Opponent Modeling in Multi-Agent Systems

论文作者

Papoudakis, Georgios, Albrecht, Stefano V.

论文摘要

多代理系统表现出复杂的行为，这些行为是从共享环境中多个代理的相互作用中散发出来的。在这项工作中，我们有兴趣控制多代理系统中的一个代理，并成功学会与其他固定策略的代理进行互动。建模其他代理（反对者）的行为对于理解系统中代理的相互作用至关重要。通过利用无监督学习的最新进展，我们提出了使用变异自动编码器对对手进行建模。此外，文献中的许多现有方法都认为，对手模型可以在训练和执行过程中访问对手的观察和行动。为了消除这一假设，我们提出了一种修改，试图仅使用代理的本地信息（例如其观察，行动和奖励）来识别基本对手模型。实验表明，我们的对手建模方法在强化学习任务中与另一种建模方法实现了相等或更高的情节回报。

Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment. In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies. Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system. By taking advantage of recent advances in unsupervised learning, we propose modeling opponents using variational autoencoders. Additionally, many existing methods in the literature assume that the opponent models have access to opponent's observations and actions during both training and execution. To eliminate this assumption, we propose a modification that attempts to identify the underlying opponent model using only local information of our agent, such as its observations, actions, and rewards. The experiments indicate that our opponent modeling methods achieve equal or greater episodic returns in reinforcement learning tasks against another modeling method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题