论文标题

基于模型的增强学习的游戏理论框架

A Game Theoretic Framework for Model Based Reinforcement Learning

论文作者

Rajeswaran, Aravind, Mordatch, Igor, Kumar, Vikash

论文摘要

基于模型的增强学习(MBRL)最近由于其样品效率的潜力和合并货质数据的能力而引起了极大的兴趣。但是,使用丰富功能近似器设计稳定有效的MBRL算法仍然具有挑战性。为了帮助揭露MBRL中的实际挑战并从抽象的角度简化算法设计,我们开发了一个新框架,将MBRL作为游戏介绍为:(1)策略玩家,该策略玩家试图在学习模型下最大化奖励; (2)模型播放器,该播放器试图适合策略播放器收集的现实世界数据。对于算法开发,我们在两个玩家之间构建了一款Stackelberg游戏,并证明它可以通过近似的双层优化来解决。这引起了MBRL的两个自然算法家庭,基于哪个玩家被选为Stackelberg游戏的领导者。他们一起封装,统一和概括了许多先前的MBRL算法。此外,我们的框架与现有工作中的实践中很重要的启发式方法一致,并为启发式法提供了明确的基础。最后,通过实验,我们验证了我们提出的算法是高度样本的效率,与无模型策略梯度的渐近性能相匹配,并优雅地扩展到具有灵活的手动操纵等高维任务。可以从项目页面https://sites.google.com/view/mbrl-game获得其他详细信息和代码

Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data. However, designing stable and efficient MBRL algorithms using rich function approximators have remained challenging. To help expose the practical challenges in MBRL and simplify algorithm design from the lens of abstraction, we develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player. For algorithm development, we construct a Stackelberg game between the two players, and show that it can be solved with approximate bi-level optimization. This gives rise to two natural families of algorithms for MBRL based on which player is chosen as the leader in the Stackelberg game. Together, they encapsulate, unify, and generalize many previous MBRL algorithms. Furthermore, our framework is consistent with and provides a clear basis for heuristics known to be important in practice from prior works. Finally, through experiments we validate that our proposed algorithms are highly sample efficient, match the asymptotic performance of model-free policy gradient, and scale gracefully to high-dimensional tasks like dexterous hand manipulation. Additional details and code can be obtained from the project page at https://sites.google.com/view/mbrl-game

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源