关于拟合的Q-材料与两层神经网络参数化的全局收敛

论文标题

关于拟合的Q-材料与两层神经网络参数化的全局收敛

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

论文作者

Gaur, Mudit, Aggarwal, Vaneet, Agarwal, Mridul

论文摘要

基于Q学习的深度学习算法已成功地应用于许多决策问题，而其理论基础却不那么了解。在本文中，我们研究了具有两层relu神经网络参数化的拟合Q介质，并找到该算法的样品复杂性保证。我们的方法使用凸优化问题估算每次迭代中的Q功能。我们表明，这种方法实现了$ \ tilde {\ Mathcal {o}}}}（1/ε^{2}）$的样本复杂性，这是Order-Optimal。该结果适用于可数的状态空间，不需要任何假设，例如MDP上的线性或低等级结构。

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/ε^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题