通过自动辅助损失搜索进行强化学习

论文标题

通过自动辅助损失搜索进行强化学习

Reinforcement Learning with Automated Auxiliary Loss Search

论文作者

He, Tairan, Zhang, Yuge, Ren, Kan, Liu, Minghuan, Wang, Che, Zhang, Weinan, Yang, Yuqing, Li, Dongsheng

论文摘要

良好的国家代表性对于解决复杂的加强学习（RL）挑战至关重要。许多最近的作品着重于设计用于学习信息表示的辅助损失。不幸的是，这些手工制作的目标在很大程度上取决于专家知识，并且可能是最佳的。在本文中，我们提出了一种有原则的通用方法，用于学习具有辅助损失函数的更好表示，称为自动化的辅助损失搜索（A2LS），该搜索自动搜索RL的表现最佳辅助损失函数。具体而言，根据收集的轨迹数据，我们定义了一个尺寸$ 7.5 \ times 10^{20} $的一般辅助损耗空间，并通过有效的进化搜索策略探索空间。经验结果表明，发现的辅助损失（即A2-Winner）显着提高了具有较高效率的高维（图像）和低维（矢量）未见任务的高度（图像）和低维（矢量）的性能，显示出对不同设置甚至不同基准标准域的有希望的概括能力。我们进行统计分析，以揭示辅助损失模式与RL性能之间的关系。

A good state representation is crucial to solving complicated reinforcement learning (RL) challenges. Many recent works focus on designing auxiliary losses for learning informative representations. Unfortunately, these handcrafted objectives rely heavily on expert knowledge and may be sub-optimal. In this paper, we propose a principled and universal method for learning better representations with auxiliary loss functions, named Automated Auxiliary Loss Search (A2LS), which automatically searches for top-performing auxiliary loss functions for RL. Specifically, based on the collected trajectory data, we define a general auxiliary loss space of size $7.5 \times 10^{20}$ and explore the space with an efficient evolutionary search strategy. Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and low-dimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题