贝叶斯基于人群的培训

论文标题

贝叶斯基于人群的培训

Bayesian Generational Population-Based Training

论文作者

Wan, Xingchen, Lu, Cong, Parker-Holder, Jack, Ball, Philip J., Nguyen, Vu, Ru, Binxin, Osborne, Michael A.

论文摘要

强化学习（RL）为可以在现实世界中自主互动的培训代理提供了潜力。但是，一个关键的限制是RL算法对核心超参数和网络体系结构选择的脆性。此外，诸如不断发展的训练数据和增加的代理复杂性等非平稳性意味着不同的超参数和体系结构在不同的训练点上可能是最佳的。这激发了Autorl，这是一种试图自动化这些设计选择的方法。一类突出的Autorl方法是基于人群的培训（PBT），这在几个大型设置中导致了令人印象深刻的表现。在本文中，我们介绍了PBT式方法中的两项新创新。首先，我们采用基于信任区域的贝叶斯优化，使高维混合高参数搜索空间的全面覆盖。其次，我们表明，使用世代相传，我们还可以在一次训练中共同学习体系结构和超参数。利用新型高度可行的Brax物理引擎，我们表明这些创新导致了巨大的性能增长，在即时学习整个配置时，大大优于调谐基线。代码可在https://github.com/xingchenwan/bgpbt上找到。

Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly. Code is available at https://github.com/xingchenwan/bgpbt.

下载PDF全文

下载文献需遵守相关版权规定

论文标题