论文标题
可扩展终身增强学习的强大任务模型的差异过程混合
A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning
论文作者
论文摘要
尽管增强学习(RL)算法正在实现各种具有挑战性的任务中的最新表现,但在面对终身流媒体信息时,它们可以轻松遇到灾难性的遗忘或干扰。在本文中,我们提出了一种可扩展的终身RL方法,该方法可以动态扩展网络容量以适应新知识,同时又可以防止过去的记忆受到干扰。我们使用Dirichlet过程混合物来对非平稳任务分布进行建模,该任务分布通过估计任务至集群分配的可能性并将任务模型簇在潜在空间中来捕获任务相关性。我们将混合物的先前分布作为中国餐厅工艺(CRP),可根据需要实例化新的混合组件。混合物的更新和扩展由贝叶斯非参数框架和期望最大化(EM)过程控制,该过程在没有明确的任务边界或启发式方面动态适应了模型复杂性。此外,我们使用域随机化技术来训练强大的先验参数,以在混合物中的每个任务模型的初始化,因此所得模型可以更好地概括并适应看不见的任务。通过在机器人导航和运动域进行的广泛实验,我们表明我们的方法成功促进了可扩展的终身RL,并且胜过相关的现有方法。
While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture, thus the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.