论文标题
代理经验重播:用于分布式增强学习的联合蒸馏
Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning
论文作者
论文摘要
传统的分布式深钢筋学习(RL)通常依赖于交换每个代理商的经验重播记忆(RM)。由于RM包含所有国家观察和行动政策历史记录,因此在违反每个代理商的隐私时,它可能会产生巨大的沟通开销。另外,本文提出了一种沟通效率和隐私的分布式RL框架,即联合加固蒸馏(FRD)。在FRD中,每个代理商都交换其代理体验重播记忆(ProxRM),其中策略在本地平均相对于代理状态聚集了实际状态。为了提供FRD设计见解,我们介绍了有关ProxRM结构,神经网络体系结构和通信间隔的影响的消融研究。此外,我们提出了一个改进的FRD版本,造成的Mixup增强FRD(MixFRD),其中使用混合数据增强算法对ProxRM进行了插值。与基准方案,香草FRD,联合加固学习(FRL)和政策蒸馏(PD)相比,Cartpole环境中的模拟验证了MixFRD在减少任务完成时间和通信成本方差方面的有效性。
Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience replay memory (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated reinforcement learning (FRL), and policy distillation (PD).