在增强学习中改善概括的表示形式的时间分离

论文标题

在增强学习中改善概括的表示形式的时间分离

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

论文作者

Dunion, Mhairi, McInroe, Trevor, Luck, Kevin Sebastian, Hanna, Josiah P., Albrecht, Stefano V.

论文摘要

加强学习（RL）代理通常无法很好地概括训练期间未观察到的状态空间的环境变化。对于基于图像的RL而言，此问题尤其有问题，其中仅一个变量（例如背景颜色）可以更改图像中的许多像素。更改的像素会导致代理商的潜在图像的潜在表示，从而导致学习的策略失败。为了了解更多可靠的表示形式，我们引入了时间分离（TED），这是一项自制的辅助任务，导致剥离图像表示，利用了RL观测的顺序性质。我们从经验上发现，与最先进的表示学习方法相比，使用TED作为辅助任务的RL算法更快地适应了通过持续培训的环境变量的变化。由于TED执行了表示形式的分离结构，因此我们的实验还表明，经过TED训练的策略可以更好地推广到与任务无关的变量值（例如背景颜色）以及影响最佳策略（例如目标位置）的变量值的看不见值。

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, our experiments also show that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).

下载PDF全文

下载文献需遵守相关版权规定

论文标题