良好的视觉指导使得更好的提取器：多模式实体和关系提取的分层视觉前缀

论文标题

良好的视觉指导使得更好的提取器：多模式实体和关系提取的分层视觉前缀

Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

论文作者

Chen, Xiang, Zhang, Ningyu, Li, Lei, Yao, Yunzhi, Deng, Shumin, Tan, Chuanqi, Huang, Fei, Si, Luo, Chen, Huajun

论文摘要

多模式命名实体识别和关系提取（MNER和MRE）是信息提取的基本和关键分支。但是，当文本中包含无关的对象图像时，现有的MNER和MRE的方法通常会遭受错误敏感性。为了解决这些问题，我们提出了一种新型的层次视觉前缀融合网络（HVPNET），以实现视觉增强实体和关系提取，旨在实现更有效和稳健的性能。具体而言，我们将视觉表示为可插入的视觉前缀，以指导文本表示错误，以确保错误不敏感的预测决策。我们进一步提出了一种动态的封闭聚合策略，以实现层次多尺度的视觉特征作为融合的视觉前缀。在三个基准数据集上进行的广泛实验证明了我们方法的有效性，并实现了最先进的性能。代码可在https://github.com/zjunlp/hvpnet中找到。

Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance. Code is available in https://github.com/zjunlp/HVPNeT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题