关系形式：图像对图的统一框架

论文标题

关系形式：图像对图的统一框架

Relationformer: A Unified Framework for Image-to-Graph Generation

论文作者

Shit, Suprosanna, Koner, Rajat, Wittmann, Bastian, Paetzold, Johannes, Ezhov, Ivan, Li, Hongwei, Pan, Jiazhen, Sharifzadeh, Sahand, Kaissis, Georgios, Tresp, Volker, Menze, Bjoern

论文摘要

图像的全面表示需要理解对象及其相互关系，尤其是在图像对图中，例如道路网络提取，血管网络提取或场景图生成。传统上，图像对支控制的生成是通过由对象检测组成的两阶段方法来解决的，然后是单独的关系预测，从而阻止了同时对对象缔合相互作用。这项工作提出了一个统一的基于一阶段变压器的框架，即关系形式，该框架共同预测对象及其关系。我们利用基于直接集合的对象预测，并将对象之间的相互作用结合在一起，以共同学习对象界定表示。除了现有的[obj] - tokens之外，我们还提出了一个新颖的可学习令牌，即[rln] -token。与[obj] -tokens一起，[rln] - token通过一系列相互关联在图像中利用本地和全球语义推理。结合配对[OBJ] - 言语，[RLN]键入有助于计算有效的关系预测。我们在多个，多样化和多域数据集上实现最新的性能，以证明我们的方法的有效性和可推广性。

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach's effectiveness and generalizability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题