论文标题
Q*用于批处理增强学习的近似方案:理论比较
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
论文作者
论文摘要
我们证明了两种算法的性能保证,用于近似$ q^\ star $ in Batch增强学习。与经典的迭代方法(例如拟合Q材料)相比,其性能损失会导致二次依赖对地平线的依赖 - 这些方法估算了Bellman误差的(某些形式的)贝尔曼误差并享受线性误差误差传播,这是首次建立的属性,该属性是首次依赖于批次数据和输出数据和输出数据和输出数据的算法。其中一种算法使用一种新颖而明确的重要性加权校正来克服贝尔曼误差估计中臭名昭著的“双重抽样”难度,并且不使用任何平方损失。我们的分析揭示了与经典算法相比,其独特的特征和潜在优势。
We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning. Compared to classical iterative methods such as Fitted Q-Iteration---whose performance loss incurs quadratic dependence on horizon---these methods estimate (some forms of) the Bellman error and enjoy linear-in-horizon error propagation, a property established for the first time for algorithms that rely solely on batch data and output stationary policies. One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous "double sampling" difficulty in Bellman error estimation, and does not use any squared losses. Our analyses reveal its distinct characteristics and potential advantages compared to classical algorithms.