论文标题
线性二次控制的大约中点政策迭代
Approximate Midpoint Policy Iteration for Linear Quadratic Control
论文作者
论文摘要
我们提出了一种中点策略迭代算法,以解决基于模型和无模型设置的线性二次最佳控制问题。该算法是牛顿方法的一种变体,我们表明,在基于模型的设置中,它实现了立方收敛,它分别优于标准策略迭代和策略梯度算法,分别实现了二次和线性收敛。我们还证明,可以通过使用轨迹数据对状态行动值函数的最小二乘估算来估算,可以从中获得策略改进,从而在不了解动力学模型的情况下近似实现算法。借助足够的轨迹数据,该策略将立方收敛于大约最佳的策略,并且与近似标准策略迭代相同的可用样本预算发生。数值实验证明了所提出的算法的有效性。
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings. The algorithm is a variation of Newton's method, and we show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy gradient algorithms that achieve quadratic and linear convergence, respectively. We also demonstrate that the algorithm can be approximately implemented without knowledge of the dynamics model by using least-squares estimates of the state-action value function from trajectory data, from which policy improvements can be obtained. With sufficient trajectory data, the policy iterates converge cubically to approximately optimal policies, and this occurs with the same available sample budget as the approximate standard policy iteration. Numerical experiments demonstrate effectiveness of the proposed algorithms.