利用语义Epsilon贪婪探索策略在多代理增强学习中

论文标题

利用语义Epsilon贪婪探索策略在多代理增强学习中

Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning

论文作者

Tse, Hon Tik, Leung, Ho-fung

论文摘要

多代理增强学习（MARL）可以对许多现实世界应用进行建模。但是，许多Marl方法都依靠Epsilon Greedy进行探索，这可能会在困难的情况下阻止访问的有利状态。在本文中，我们提出了一种新方法QMIX（SEG）来解决MARL。它利用价值函数分解方法QMIX来训练人均策略和一种新颖的语义Epsilon Greedy（SEG）探索策略。 SEG是传统的Epsilon贪婪探索策略的简单扩展，但经过实验表明可以极大地提高MARL的性能。我们首先将作用集中在具有相似效果的动作组中，然后在双层埃普西隆贪婪的探索层次结构中使用这些组进行行动选择。我们认为，SEG通过在动作群体的空间中进行探索来促进语义探索，这些动作具有比原子行动更丰富的语义含义。实验表明，QMIX（SEG）在很大程度上胜过QMIX，并在Starcraft Multi-Agent Challenge（SMAC）基准中采用当前最新的MARL方法提高性能竞争力。

Multi-agent reinforcement learning (MARL) can model many real world applications. However, many MARL approaches rely on epsilon greedy for exploration, which may discourage visiting advantageous states in hard scenarios. In this paper, we propose a new approach QMIX(SEG) for tackling MARL. It makes use of the value function factorization method QMIX to train per-agent policies and a novel Semantic Epsilon Greedy (SEG) exploration strategy. SEG is a simple extension to the conventional epsilon greedy exploration strategy, yet it is experimentally shown to greatly improve the performance of MARL. We first cluster actions into groups of actions with similar effects and then use the groups in a bi-level epsilon greedy exploration hierarchy for action selection. We argue that SEG facilitates semantic exploration by exploring in the space of groups of actions, which have richer semantic meanings than atomic actions. Experiments show that QMIX(SEG) largely outperforms QMIX and leads to strong performance competitive with current state-of-the-art MARL approaches on the StarCraft Multi-Agent Challenge (SMAC) benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题