论文标题

多任务协作网络,用于联合参考表达理解和细分

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

论文作者

Luo, Gen, Zhou, Yiyi, Sun, Xiaoshuai, Cao, Liujuan, Wu, Chenglin, Deng, Cheng, Ji, Rongrong

论文摘要

参考表达理解(REC)和分割(RES)是两个高度相关的任务,均旨在根据自然语言表达来识别引用。在本文中,我们提出了一个新颖的多任务协作网络(MCN),以首次获得REC和RES的联合学习。在MCN中,RES可以帮助REC实现更好的语言视觉对齐,而REC可以帮助RES更好地定位参考物。此外,我们在此多任务设置(即预测冲突)中提出了一个关键挑战,即两种创新的设计,即一致性能量最大化(CEM)和自适应软软抑制(ASNL)。具体而言,CEM使REC和RES通过最大化两个任务之间的一致性能量来关注相似的视觉区域。 ASNL基于REC的预测,将无关区域的反应置于RES中的响应。为了验证我们的模型,我们对REC和RES的三个基准数据集进行了广泛的实验,即Refcoco,Refcoco+和Refcocog。实验结果报告了MCN在所有现有方法上的显着性能提高,即REC的 +7.13%,而SOTA的RES为 +11.50%,这很好地证实了我们模型对关节REC和RES学习的有效性。

Referring expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a natural language expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源