论文标题
实现深入的强化学习代理,发现量子系统的实时反馈控制策略
Realizing a deep reinforcement learning agent discovering real-time feedback control strategies for a quantum system
论文作者
论文摘要
为了实现量子技术的全部潜力,找到实时控制量子信息处理设备的良好策略变得越来越重要。通常,这些策略需要对设备本身的精确理解,这通常是不可用的。无模型的加固学习可以通过从头开始发现控制策略而不依赖于量子系统的准确描述来规避这种需求。此外,重要的任务,例如状态准备,门传送和误差校正需要时间尺度的反馈比相干时间短得多,对于超导电路而言,它在微秒范围内。开发和培训能够在这种实时反馈制度中运作的深厚的强化学习代理是一个挑战。在这里,我们以延迟优化的深神经网络(FPGA)的形式实现了这种代理。我们证明了它有效地将超导量子置量初始化为目标状态的用途。为了训练代理,我们使用仅基于测量数据的无模型增强学习。我们研究了代理商的性能,以进行强和弱测量以及三级读数,并根据阈值与简单的策略进行比较。这项演示促使进一步的研究采用增强学习,以实时反馈控制量子设备,更通常是任何需要可学习的低延迟反馈控制的物理系统。
To realize the full potential of quantum technologies, finding good strategies to control quantum information processing devices in real time becomes increasingly important. Usually these strategies require a precise understanding of the device itself, which is generally not available. Model-free reinforcement learning circumvents this need by discovering control strategies from scratch without relying on an accurate description of the quantum system. Furthermore, important tasks like state preparation, gate teleportation and error correction need feedback at time scales much shorter than the coherence time, which for superconducting circuits is in the microsecond range. Developing and training a deep reinforcement learning agent able to operate in this real-time feedback regime has been an open challenge. Here, we have implemented such an agent in the form of a latency-optimized deep neural network on a field-programmable gate array (FPGA). We demonstrate its use to efficiently initialize a superconducting qubit into a target state. To train the agent, we use model-free reinforcement learning that is based solely on measurement data. We study the agent's performance for strong and weak measurements, and for three-level readout, and compare with simple strategies based on thresholding. This demonstration motivates further research towards adoption of reinforcement learning for real-time feedback control of quantum devices and more generally any physical system requiring learnable low-latency feedback control.