论文标题
情节增强学习中乐观的统一观点
A Unifying View of Optimism in Episodic Reinforcement Learning
论文作者
论文摘要
面对不确定性的乐观原则是许多理论上成功的增强学习算法的基础。在本文中,我们为在情节增强学习问题中设计,分析和实施此类算法提供了一个一般框架。该框架建立在拉格朗日二元上,并证明每种构建乐观MDP的模型自动算法都具有等效的表示,作为一个价值动态的动态编程算法。通常,人们认为这两类算法是不同的,模型当算法受益于清洁概率分析,而价值优势算法更易于实施,因此更实用。随着本文开发的框架,我们表明可以通过提供具有具有计算有效的动态编程实现以及简单概率分析的一类算法来获得两全其美。除了能够捕获表格设置中的许多现有算法外,我们的框架还可以解决可实现的函数近似值下的大刻度问题,从而可以对一些最近提出的方法进行简单的基于模型的分析。
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.