利用常识到部分场景中的对象定位

论文标题

利用常识到部分场景中的对象定位

Leveraging commonsense for object localisation in partial scenes

论文作者

Giuliari, Francesco, Skenderi, Geri, Cristani, Marco, Del Bue, Alessio, Wang, Yiming

论文摘要

我们提出了一个端到端解决方案，以解决部分场景中对象定位的问题，我们旨在估算仅给出场景的部分3D扫描的对象在未知区域中的位置。我们提出了一个新颖的场景表示形式，以促进几何推理，定向空间常识图（D-SCG），这是一个空间场景图，富含来自常识性知识库的其他概念节点。具体而言，D-SCG的节点表示场景对象，边缘是它们的相对位置。然后，每个对象节点通过不同的常识关系连接到一组概念节点。通过提出的基于图的场景表示，我们使用实现新颖注意信息传递机制的图神经网络估计目标对象的未知位置。该网络首先通过汇总对象节点和D-SCG中的概念节点来学习对象的丰富表示，从而预测目标对象与每个可见对象之间的相对位置。然后将这些相对位置合并以获得最终位置。我们使用部分扫描板评估我们的方法，以更快的8倍训练速度的定位准确性提高了5.9％。

We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept nodes from a commonsense knowledge base. Specifically, the nodes of D-SCG represent the scene objects and the edges are their relative positions. Each object node is then connected via different commonsense relationships to a set of concept nodes. With the proposed graph-based scene representation, we estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism. The network first predicts the relative positions between the target object and each visible object by learning a rich representation of the objects via aggregating both the object nodes and the concept nodes in D-SCG. These relative positions then are merged to obtain the final position. We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题