论文标题
人类反馈不确定的学习行为
Learning Behaviors with Uncertain Human Feedback
论文作者
论文摘要
人类反馈被广泛用于训练许多领域的代理。但是,先前的作品很少考虑人类提供反馈时的不确定性,尤其是在培训师对最佳行动并不明显的情况下。例如,亚最佳动作的奖励可以是随机的,有时超过了最佳动作的奖励,这在游戏或现实世界中很常见。培训师可能会提供对亚最佳行动的积极反馈,对最佳动作的负面反馈,甚至在某些令人困惑的情况下没有提供反馈。现有的作品利用了期望最大化(EM)算法并将反馈模型视为隐藏参数,因此不考虑学习环境和人类反馈中的不确定性。为了应对这一挑战,我们介绍了一种新颖的反馈模型,该模型考虑了人类反馈的不确定性。但是,这会在EM算法中造成难治性的演算。为此,我们提出了一种新型的近似EM算法,在该算法中,我们使用梯度下降方法近似期望步骤。在合成场景和两个现实世界情景中,与人类参与者的实验结果证明了我们提出的方法的出色表现。
Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.