使用平滑凸信封对随机近似的有限样本分析

论文标题

使用平滑凸信封对随机近似的有限样本分析

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

论文作者

Chen, Zaiwei, Maguluri, Siva Theja, Shakkottai, Sanjay, Shanmugam, Karthikeyan

论文摘要

随机近似（SA）是求解信息被噪声损坏的固定点方程的流行方法。在本文中，我们考虑了涉及与任意规范的收缩映射的SA，并在使用不同的步骤时显示其有限样本误差界限。这个想法是使用广义的Moreau包裹构建光滑的Lyapunov函数，并表明SA的迭代相对于该Lyapunov函数具有负漂移。我们的结果适用于增强学习（RL）。特别是，我们使用它来建立用于非政策TD学习的V-Trace算法的首要收敛速率。此外，我们还使用它来研究在政策环境中的TD学习，并以$ Q $ - 学习的方式恢复现有的最新结果。重要的是，我们的构造仅导致汇聚与状态空间大小的对数依赖性。

Stochastic Approximation (SA) is a popular approach for solving fixed-point equations where the information is corrupted by noise. In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning. Moreover, we also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning. Importantly, our construction results in only a logarithmic dependence of the convergence bound on the size of the state-space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题