通过基于梯度的元学习可重复使用的选项

论文标题

通过基于梯度的元学习可重复使用的选项

Reusable Options through Gradient-based Meta Learning

论文作者

Kuric, David, van Hoof, Herke

论文摘要

强化学习中的层次结构方法有可能减少代理商在学习新任务时需要执行的决策数量。但是，发现促进快速学习的可重复使用的有用的时间抽象仍然是一个具有挑战性的问题。最近，提出了几种深度学习方法，以端到端的方式以选项的形式学习这种时间抽象。在这项工作中，我们指出了这些方法的几个缺点，并讨论了它们的潜在负面后果。随后，我们为可重复使用的选项制定了Desiderata，并将其用来将学习选项的问题作为基于梯度的元学习问题。这使我们能够制定一个目标，该目标明确激励了选项，该选项允许高级决策者在几个步骤中调整到不同的任务。在实验上，我们表明我们的方法能够学习可转移的组件，这些组件可以加速学习，并且比为此环境开发的现有方法更好。此外，我们进行消融以量化使用基于梯度的元学习以及其他建议的更改的影响。

Hierarchical methods in reinforcement learning have the potential to reduce the amount of decisions that the agent needs to perform when learning new tasks. However, finding reusable useful temporal abstractions that facilitate fast learning remains a challenging problem. Recently, several deep learning approaches were proposed to learn such temporal abstractions in the form of options in an end-to-end manner. In this work, we point out several shortcomings of these methods and discuss their potential negative consequences. Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem. This allows us to formulate an objective that explicitly incentivizes options which allow a higher-level decision maker to adjust in few steps to different tasks. Experimentally, we show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods developed for this setting. Additionally, we perform ablations to quantify the impact of using gradient-based meta-learning as well as other proposed changes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题