论文标题
$ \ mathrm {so}(2)$ - 均值增强学习
$\mathrm{SO}(2)$-Equivariant Reinforcement Learning
论文作者
论文摘要
模棱两可的神经网络在其卷积层的结构内强制对称性,从而在学习模棱两可或不变功能时会大大提高样品效率。这样的模型适用于机器人操纵学习,通常可以将其作为旋转对称问题进行配合。本文以$ Q $ - 学习和参与者的批判性强化学习的背景研究了模型体系结构。我们确定了最佳$ Q $功能和最佳策略的模棱两可的和不变的特征,并提出了利用这种结构的dqn和SAC算法。我们提出的实验表明,与在重要类别的机器人操作问题类别上竞争算法相比,DQN和SAC的dqn和SAC的模棱两可可能更有效。
Equivariant neural networks enforce symmetry within the structure of their convolutional layers, resulting in a substantial improvement in sample efficiency when learning an equivariant or invariant function. Such models are applicable to robotic manipulation learning which can often be formulated as a rotationally symmetric problem. This paper studies equivariant model architectures in the context of $Q$-learning and actor-critic reinforcement learning. We identify equivariant and invariant characteristics of the optimal $Q$-function and the optimal policy and propose equivariant DQN and SAC algorithms that leverage this structure. We present experiments that demonstrate that our equivariant versions of DQN and SAC can be significantly more sample efficient than competing algorithms on an important class of robotic manipulation problems.