论文标题
控制频率通过批处理学习中的动作持久性适应
Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
论文作者
论文摘要
系统的控制频率的选择对增强学习算法学习高度性能策略的能力有相关的影响。在本文中,我们介绍了动作持久性的概念,该概念在于重复固定数量的决策步骤的动作,并具有修改控制频率的效果。我们开始分析动作持久性如何影响最佳策略的性能,然后提出一种新颖的算法,持久拟合的Q-率(PFQI),该算法扩展了FQI,目的是在给定的持久性下学习最佳价值功能。在提供了PFQI的理论研究和确定最佳持久性的启发式方法之后,我们在基准领域提出了一项实验运动,以显示动作持久性的优势并证明我们持久性选择方法的有效性。
The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy. In this paper, we introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps, having the effect of modifying the control frequency. We start analyzing how action persistence affects the performance of the optimal policy, and then we present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence. After having provided a theoretical study of PFQI and a heuristic approach to identify the optimal persistence, we present an experimental campaign on benchmark domains to show the advantages of action persistence and proving the effectiveness of our persistence selection method.