揭示各种Quantum Quantum Deep Q-Networks的不稳定性

论文标题

揭示各种Quantum Quantum Deep Q-Networks的不稳定性

Uncovering Instabilities in Variational-Quantum Deep Q-Networks

论文作者

Franz, Maja, Wolf, Lucas, Periyasamy, Maniraman, Ufrecht, Christian, Scherer, Daniel D., Plinge, Axel, Mutschler, Christopher, Mauerer, Wolfgang

论文摘要

在过去的十年中，深入的增强学习（RL）已经大大发展。同时，最先进的RL算法在培训时间的融合时间上需要大量的计算预算。最近的工作已经开始通过量子计算的角度来解决这个问题，这有望为几个传统上的艰巨任务进行理论上的速度。在这项工作中，我们研究了一类混合量子古典RL算法，我们共同称为变异量子量子深Q-NETWORKS（VQ-DQN）。我们表明，VQ-DQN方法受到导致学习政策分歧的不稳定性，研究了基于经典模拟的已建立结果的重生程度，并执行系统的实验以识别观察到的不稳定性的潜在解释。此外，与大多数现有的量子增强学习中现有工作相反，我们在实际量子处理单元（IBM量子设备）上执行RL算法，并调查遭受实施不足的模拟和物理量子系统之间行为差异。我们的实验表明，与文献中相反的主张相反，与经典方法相比，即使在没有物理缺陷的情况下进行模拟，也不能最终决定是否已知量子方法，也可以提供优势。最后，我们提供了VQ-DQN的强大，通用且经过充分测试的实现，作为将来实验的可再现测试床。

Deep Reinforcement Learning (RL) has considerably advanced over the past decade. At the same time, state-of-the-art RL algorithms require a large computational budget in terms of training time to converge. Recent work has started to approach this problem through the lens of quantum computing, which promises theoretical speed-ups for several traditionally hard tasks. In this work, we examine a class of hybrid quantum-classical RL algorithms that we collectively refer to as variational quantum deep Q-networks (VQ-DQN). We show that VQ-DQN approaches are subject to instabilities that cause the learned policy to diverge, study the extent to which this afflicts reproduciblity of established results based on classical simulation, and perform systematic experiments to identify potential explanations for the observed instabilities. Additionally, and in contrast to most existing work on quantum reinforcement learning, we execute RL algorithms on an actual quantum processing unit (an IBM Quantum Device) and investigate differences in behaviour between simulated and physical quantum systems that suffer from implementation deficiencies. Our experiments show that, contrary to opposite claims in the literature, it cannot be conclusively decided if known quantum approaches, even if simulated without physical imperfections, can provide an advantage as compared to classical approaches. Finally, we provide a robust, universal and well-tested implementation of VQ-DQN as a reproducible testbed for future experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题