论文标题
泰勒扩展政策优化
Taylor Expansion Policy Optimization
论文作者
论文摘要
在这项工作中,我们研究了泰勒扩展在增强学习中的应用。特别是,我们提出了泰勒扩展政策优化,这是一种策略优化形式主义,将先前的工作(例如TRPO)概括为一阶特殊情况。我们还表明,泰勒的扩张与非政策评估密切相关。最后,我们表明,这种新的配方需要修改,从而改善了几种最先进的分布式算法的性能。
In this work, we investigate the application of Taylor expansions in reinforcement learning. In particular, we propose Taylor expansion policy optimization, a policy optimization formalism that generalizes prior work (e.g., TRPO) as a first-order special case. We also show that Taylor expansions intimately relate to off-policy evaluation. Finally, we show that this new formulation entails modifications which improve the performance of several state-of-the-art distributed algorithms.