论文标题
强化学习域中的上下文政策转移通过深度混合物
Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts
论文作者
论文摘要
在加强学习中,选择转移源政策时考虑上下文或当前状态的代理已显示出优于无上下文的方法。但是,现有的方法在上下文上都没有从基于模型的学习者转移到无模型的学习者。例如,当故意学习源策略在具有丰富数据的不同模拟上,但已转移到具有有限数据的真实世界设置时,这可能很有用。在本文中,我们假设估计的源任务动态和策略以及常见的子目标知识,但动态不同。我们介绍了一种新型的深层专家公式,以使用从目标任务收集的状态轨迹匹配目标动态的源任务动态的学习状态依赖性信念。混合模型易于解释,证明了动态中估计错误的鲁棒性,并且与大多数学习算法兼容。然后,我们展示如何将该模型纳入标准的政策重用框架中,并证明其对OpenAi-Gym基准的有效性。
In reinforcement learning, agents that consider the context, or current state, when selecting source policies for transfer have been shown to outperform context-free approaches. However, none of the existing approaches transfer knowledge contextually from model-based learners to a model-free learner. This could be useful, for instance, when source policies are intentionally learned on diverse simulations with plentiful data but transferred to a real-world setting with limited data. In this paper, we assume knowledge of estimated source task dynamics and policies, and common sub-goals but different dynamics. We introduce a novel deep mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics that match the target dynamics using state trajectories collected from the target task. The mixture model is easy to interpret, demonstrates robustness to estimation errors in dynamics, and is compatible with most learning algorithms. We then show how this model can be incorporated into standard policy reuse frameworks, and demonstrate its effectiveness on benchmarks from OpenAI-Gym.