论文标题

悲观主义在固定数据策略优化中的重要性

The Importance of Pessimism in Fixed-Dataset Policy Optimization

论文作者

Buckman, Jacob, Gelada, Carles, Bellemare, Marc G.

论文摘要

我们研究了最坏的案例保证固定数据库政策优化算法的预期回报。我们的核心贡献是该制度中算法研究的统一概念和数学框架。该分析表明,对于幼稚的方法,错误价值高估的可能性导致了难以满足的要求:为了确保我们选择一个近乎最佳的策略,我们可能需要数据集来了解每个政策的价值。为避免这种情况,算法可以遵循悲观原则,该原则指出,我们应该选择在最糟糕的世界中最佳起作用的政策。我们展示了为什么悲观算法即使数据集对每个政策都没有信息,并得出了遵循该原则的算法家庭,即使数据集并不能实现良好的性能。这些理论发现是通过在表格网格世界上进行的实验来验证的,以及在四个漫画环境上进行深度学习实验。

We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal, we may need the dataset to be informative of the value of every policy. To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world. We show why pessimistic algorithms can achieve good performance even when the dataset is not informative of every policy, and derive families of algorithms which follow this principle. These theoretical findings are validated by experiments on a tabular gridworld, and deep learning experiments on four MinAtar environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源