论文标题
在连续域中可证明的SIM到现实转移,并进行部分观察
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
论文作者
论文摘要
模拟环境中的SIM转换训练RL代理,然后将其部署在现实世界中。在实践中,SIM到现实的转移已被广泛使用,因为在模拟中收集样本通常比现实世界更便宜,更安全,更快。尽管SIM到现实转移取得了经验成功,但其理论基础知之甚少。在本文中,我们研究了连续域中使用部分观察的模拟环境和现实环境的SIM到实现的转移,该环境由线性二次高斯(LQG)系统建模。我们表明,一种流行的强大对抗训练算法能够从模拟环境中学习与现实环境中最佳政策有竞争力的政策。为了实现我们的结果,我们为无限 - 马平均成本LQG的新算法设计了一种新算法,并建立了遗憾的结合,取决于模型类别的内在复杂性。我们的算法至关重要地依赖于一种新颖的历史剪裁方案,这可能具有独立的兴趣。
Sim-to-real transfer trains RL agents in the simulated environments and then deploys them in the real world. Sim-to-real transfer has been widely used in practice because it is often cheaper, safer and much faster to collect samples in simulation than in the real world. Despite the empirical success of the sim-to-real transfer, its theoretical foundation is much less understood. In this paper, we study the sim-to-real transfer in continuous domain with partial observations, where the simulated environments and real-world environments are modeled by linear quadratic Gaussian (LQG) systems. We show that a popular robust adversarial training algorithm is capable of learning a policy from the simulated environment that is competitive to the optimal policy in the real-world environment. To achieve our results, we design a new algorithm for infinite-horizon average-cost LQGs and establish a regret bound that depends on the intrinsic complexity of the model class. Our algorithm crucially relies on a novel history clipping scheme, which might be of independent interest.