论文标题

Lipschitz终生增强学习

Lipschitz Lifelong Reinforcement Learning

论文作者

Lecarpentier, Erwan, Abel, David, Asadi, Kavosh, Jinnai, Yuu, Rachelson, Emmanuel, Littman, Michael L.

论文摘要

当代理人面临一系列强化学习(RL)任务时,我们考虑知识转移的问题。我们在马尔可夫决策过程(MDP)之间介绍了一个新的指标,并确定密切的MDP具有接近最佳的价值函数。正式地,相对于任务空间,最佳值函数是Lipschitz的连续。这些理论上的结果使我们采用了终身RL的值转移方法,我们用来构建具有提高收敛速率的PAC-MDP算法。此外,我们展示了没有高概率经历负面转移的方法。我们说明了该方法在终身RL实验中的好处。

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. Further, we show the method to experience no negative transfer with high probability. We illustrate the benefits of the method in Lifelong RL experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源