双向对象字幕优先级学习显着排名

论文标题

双向对象字幕优先级学习显着排名

Bi-directional Object-context Prioritization Learning for Saliency Ranking

论文作者

Tian, Xin, Xu, Ke, Yang, Xin, Du, Lin, Yin, Baocai, Lau, Rynson W. H.

论文摘要

最近提出了显着排名的任务，以研究人类通常会根据其显着程度将注意力转移到场景的不同对象上的视觉行为。现有方法的重点是学习对象对象或对象场景关系。这种策略遵循心理学中基于对象的关注的想法，但它倾向于偏爱那些具有强大语义（例如人类）的对象，从而导致不现实的显着排名。我们观察到，空间注意力与人类视觉识别系统中的基于对象的注意力同时起作用。在识别过程中，人类的空间注意机制将在各个地区（即上下文到上下文）移动，参与和脱离。除了对象级别的推理外，这激发了我们为显着排名建模区域级交互。为此，我们提出了一种新型的双向方法，以统一空间注意力和基于对象的关注显着排名。我们的模型包括两个新的模块：（1）选择性对象显着性（SOS）模块，该模块通过推断出显着对象的语义表示基于对象的注意力，以及（2）一个对象符合对象关系（OCOR）模块，该模块通过对对象 - 键入对象和上下文对象进行构建正式对象的相互作用，从而将显着性排名分配给对象。广泛的实验表明，我们的方法的表现优于现有的最新方法。我们的代码和预算模型可在https://github.com/grassbro/ocor上找到。

The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with strong semantics (e.g., humans), resulting in unrealistic saliency ranking. We observe that spatial attention works concurrently with object-based attention in the human visual recognition system. During the recognition process, the human spatial attention mechanism would move, engage, and disengage from region to region (i.e., context to context). This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. Our model includes two novel modules: (1) a selective object saliency (SOS) module that models objectbased attention via inferring the semantic representation of the salient object, and (2) an object-context-object relation (OCOR) module that allocates saliency ranks to objects by jointly modeling the object-context and context-object interactions of the salient objects. Extensive experiments show that our approach outperforms existing state-of-theart methods. Our code and pretrained model are available at https://github.com/GrassBro/OCOR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题