论文标题
通过纠缠一分化的有效嵌入控制策略中语义相似性的嵌入
Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation
论文作者
论文摘要
在视觉分散注意力的情况下,从视觉输入中学习可普遍的政策是强化学习的一个挑战性问题。最近,人们对分分度指标有了新的兴趣,作为解决此问题的工具。这些指标可用于学习原则上通过衡量状态之间的行为相似性而与分散注意力的表示形式。这些指标的准确,公正和可扩展的估计在连续状态和动作方案中证明是难以捉摸的。我们提出了纠缠分配,这是一种允许状态之间距离函数规范的分拟合度量,并且可以在连续状态和动作空间中估计而不会估计。我们展示了即使在数据增强技术之上添加时,纠缠纠缠的分配方式如何有意义地改善分散控制套件(DC)的先前方法。
Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. Recently, there has been renewed interest in bisimulation metrics as a tool to address this issue; these metrics can be used to learn representations that are, in principle, invariant to irrelevant distractions by measuring behavioural similarity between states. An accurate, unbiased, and scalable estimation of these metrics has proved elusive in continuous state and action scenarios. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states, and can be estimated without bias in continuous state and action spaces. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS), even when added on top of data augmentation techniques.