在程序生成的环境中，演示有效的逆增强学习

论文标题

在程序生成的环境中，演示有效的逆增强学习

Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments

论文作者

Sestini, Alessandro, Kuhnle, Alexander, Bagdanov, Andrew D.

论文摘要

深度强化学习在可以手动设计奖励功能的领域中取得了非常好的效果。同时，社区中人们对基于程序的生成（PCG）作为基准环境的游戏的兴趣越来越大，因为这种类型的环境非常适合研究域转移下代理的过度拟合和概括。相反的增强学习（IRL）可以从专家演示中推断出奖励功能，即使在高维问题上也取得了良好的结果，但是没有将这些技术应用于程序生成的环境的例子。这主要是由于找到良好的奖励模型所需的示威数量。我们提出了一种基于对抗性逆增强学习的技术，该技术可以大大减少对PCG游戏中专家演示的需求。通过使用具有有限的初始种子水平的环境，以及一些稳定训练的修改，我们表明我们的方法是表现效率的，并且仍然能够推断出奖励功能，从而推广到完全程序性的域。我们证明了我们的技术对各种任务的两个程序环境的有效性。

Deep Reinforcement Learning achieves very good results in domains where reward functions can be manually engineered. At the same time, there is growing interest within the community in using games based on Procedurally Content Generation (PCG) as benchmark environments since this type of environment is perfect for studying overfitting and generalization of agents under domain shift. Inverse Reinforcement Learning (IRL) can instead extrapolate reward functions from expert demonstrations, with good results even on high-dimensional problems, however there are no examples of applying these techniques to procedurally-generated environments. This is mostly due to the number of demonstrations needed to find a good reward model. We propose a technique based on Adversarial Inverse Reinforcement Learning which can significantly decrease the need for expert demonstrations in PCG games. Through the use of an environment with a limited set of initial seed levels, plus some modifications to stabilize training, we show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain. We demonstrate the effectiveness of our technique on two procedural environments, MiniGrid and DeepCrawl, for a variety of tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题