使用数据传输的细粒度场景图生成

论文标题

使用数据传输的细粒度场景图生成

Fine-Grained Scene Graph Generation with Data Transfer

论文作者

Zhang, Ao, Yao, Yuan, Chen, Qianyu, Ji, Wei, Liu, Zhiyuan, Sun, Maosong, Chua, Tat-Seng

论文摘要

场景图生成（SGG）旨在在图像中提取（主题，谓词，对象）三重态。最近的作品在SGG上取得了稳步的进步，并为高级视野和语言理解提供了有用的工具。但是，由于数据分布问题包括长尾分布和语义歧义，当前SGG模型的预测往往会崩溃到几种频繁但不信息的谓词（例如，ON，AT），这限制了这些模型在下游任务中的实际应用。为了解决上述问题，我们提出了一种新颖的内部和外部数据传输（IETRAN）方法，该方法可以以插件方式应用，并通过1,807个谓词类扩展到大SGG。我们的Ietrans试图通过自动创建一个增强的数据集来缓解数据分布问题，该数据集为所有谓词提供了更充分，更连贯的注释。通过在增强数据集中进行培训，神经基序模型在保持竞争性微观性能的同时使宏观性能翻了一番。代码和数据可在https://github.com/waxnkw/ietrans-sgg.pytorch上公开获取。

Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题