素描图像要点：人类模拟层次结构场景图生成

论文标题

素描图像要点：人类模拟层次结构场景图生成

Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation

论文作者

Wang, Wenbin, Wang, Ruiping, Shan, Shiguang, Chen, Xilin

论文摘要

场景图旨在忠实地揭示人类对图像内容的看法。当人类分析场景时，他们通常更喜欢首先描述图像要点，即场景图中的主要对象和关键关系。这个人的固有的感知习惯意味着在现场解析过程中，人类偏爱存在一个等级结构。因此，我们认为，理想的场景图也应层次构造，并引入了一个新方案以建模场景图。具体而言，一个场景由由一系列图像区域组成的人类模拟层次实体树（HET）表示。为了生成基于HET的场景图，我们用混合长期记忆（Hybrid-LSTM）解析HET，该记录专门编码层次结构和兄弟姐妹上下文，以捕获HET中嵌入的结构化信息。为了进一步优先考虑场景图中的关键关系，我们设计了一个关系排名模块（RRM），通过学习捕获人类从客观实体显着性和大小中捕获人类主观的感知习惯来动态调整其排名。实验表明，我们的方法不仅可以实现场景图生成的最新性能，而且是挖掘特定于图像特定关系的专家，这些关系在服务下游任务中起着重要作用。

Scene graph aims to faithfully reveal humans' perception of image content. When humans analyze a scene, they usually prefer to describe image gist first, namely major objects and key relations in a scene graph. This humans' inherent perceptive habit implies that there exists a hierarchical structure about humans' preference during the scene parsing procedure. Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph. Concretely, a scene is represented by a human-mimetic Hierarchical Entity Tree (HET) consisting of a series of image regions. To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context to capture the structured information embedded in HET. To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings by learning to capture humans' subjective perceptive habits from objective entity saliency and size. Experiments indicate that our method not only achieves state-of-the-art performances for scene graph generation, but also is expert in mining image-specific relations which play a great role in serving downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题