奖励成型在对象目标导航中的作用

论文标题

奖励成型在对象目标导航中的作用

Role of reward shaping in object-goal navigation

论文作者

Madhavan, Srirangan, Pal, Anwesan, Christensen, Henrik I.

论文摘要

深度强化学习方法是最近在计算机视觉和机器人技术社区中进行视觉导航任务的流行方法。在大多数情况下，奖励函数具有二进制结构，即当代理达到目标状态时会提供大量的积极奖励，并且为环境中的每个其他状态分配了负面惩罚。这样的稀疏信号使学习过程具有挑战性，特别是在大环境中，需要采取大量的顺序动作才能达到目标。我们引入了一种奖励成型机制，该机制逐渐根据目标的距离逐渐调整奖励信号。使用AI2进行的详细实验 - 该模拟环境证明了对象目标导航任务所提出的方法的功效。

Deep reinforcement learning approaches have been a popular method for visual navigation tasks in the computer vision and robotics community of late. In most cases, the reward function has a binary structure, i.e., a large positive reward is provided when the agent reaches goal state, and a negative step penalty is assigned for every other state in the environment. A sparse signal like this makes the learning process challenging, specially in big environments, where a large number of sequential actions need to be taken to reach the target. We introduce a reward shaping mechanism which gradually adjusts the reward signal based on distance to the goal. Detailed experiments conducted using the AI2-THOR simulation environment demonstrate the efficacy of the proposed approach for object-goal navigation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题