可控扩散过程及其接近最佳性的近似Q学习

论文标题

可控扩散过程及其接近最佳性的近似Q学习

Approximate Q-Learning for Controlled Diffusion Processes and its Near Optimality

论文作者

Bayraktar, Erhan, Kara, Ali Devran

论文摘要

我们研究Q学习算法，以解决连续的随机控制问题。所提出的算法通过在零件恒定控制过程中离散状态和控制作用空间来使用采样的状态过程。我们表明，该算法将有限马尔可夫决策过程（MDP）的最优方程收敛。使用此MDP模型，我们为连续时间控制问题的最佳值函数的近似误差提供了上限。此外，与原始问题的最佳可允许的控制过程相比，我们提出了可证明的上限损失，以供学习的绩效损失。所提供的误差上限是时间和空间离散参数的函数，它们揭示了不同级别的近似级别的影响：（i）MDP通过MDP近似连续的时间控制问题，（ii）使用零件的恒定控制过程，（iii）空间空间离散化。最后，我们陈述了针对所提出的算法绑定的时间复杂度，该算法是时间和空间离散参数的函数。

We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题