论文标题
通过langevin Dynamics通过对抗训练进行强大的加强学习
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
论文作者
论文摘要
我们介绍了一个抽样的观点,以应对训练强大的增强学习(RL)代理的具有挑战性的任务。利用强大的随机梯度Langevin动力学,我们提出了一种新颖,可扩展的两者RL算法,这是两者策略梯度方法的抽样变体。我们的算法始终在几种摩joco环境上的不同培训和测试条件上的概括方面始终优于现有基准。我们的实验还表明,即使对于完全忽略潜在环境变化的目标函数,与标准RL算法相比,我们的抽样方法仍然非常健壮。
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization across different training and testing conditions, on several MuJoCo environments. Our experiments also show that, even for objective functions that entirely ignore potential environmental shifts, our sampling approach remains highly robust in comparison to standard RL algorithms.