论文标题
体现的视觉主动学习用于语义分割
Embodied Visual Active Learning for Semantic Segmentation
论文作者
论文摘要
我们研究了体现的视觉活动学习的任务,在该任务中,代理人设置了探索3D环境,其目标是通过积极选择要申请注释的视图来获得视觉场景理解。虽然在某些基准上进行了准确的态度,但是在某些现实世界中或异常观点中,当今的深视觉识别管道往往不会很好地概括。反过来,机器人的感知需要能够优化移动系统运行的条件的识别能力,包括混乱的室内环境或不良的照明。这激发了提议的任务,其中将代理放置在新的环境中,目的是提高其视觉识别能力。为了学习体现的视觉活动学习,我们开发了一系列的代理 - 既有学术和预先指定),并且对环境的了解水平不同。代理商配备了语义细分网络,并寻求获取内容丰富的观点,移动和探索以便在这些观点附近传播注释,然后通过在线再培训来完善基础细分网络。可训练的方法使用深入的强化学习,并具有奖励功能,可以平衡两个竞争目标:任务性能,表示为视觉识别精度,需要探索环境以及在主动探索过程中要求的必要的带注释的数据。我们使用phororeastic matterport3d模拟器对提出的模型进行了广泛的评估,并表明,即使要求更少的注释时,全面学习的方法的表现都优于可比较的预指定对应物。
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some benchmarks, today's deep visual recognition pipelines tend to not generalize well in certain real-world scenarios, or for unusual viewpoints. Robotic perception, in turn, requires the capability to refine the recognition capabilities for the conditions where the mobile system operates, including cluttered indoor environments or poor illumination. This motivates the proposed task, where an agent is placed in a novel environment with the objective of improving its visual recognition capability. To study embodied visual active learning, we develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment. The agents are equipped with a semantic segmentation network and seek to acquire informative views, move and explore in order to propagate annotations in the neighbourhood of those views, then refine the underlying segmentation network by online retraining. The trainable method uses deep reinforcement learning with a reward function that balances two competing objectives: task performance, represented as visual recognition accuracy, which requires exploring the environment, and the necessary amount of annotated data requested during active exploration. We extensively evaluate the proposed models using the photorealistic Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.