论文标题
通过自适应融合通过依赖性增强的预训练模型来改善语义匹配
Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
论文作者
论文摘要
基于变压器的预训练模型(例如BERT)在语义句子匹配方面取得了巨大进展。同时,依赖性先验知识还显示了多个NLP任务中的一般好处。但是,如何有效地将依赖性的先验结构整合到预训练的模型中以更好地建模复杂的语义匹配关系,但仍未解决。在本文中,我们提出了\ textbf {d} epentency-hanced \ textbf {a} dappive \ textbf {f} usion \ textbf {a} ttention(\ textbf {daftbf {dafa}),从而明确地将依赖性结构引入了依赖性结构中,并将其与适当的模型融合到适当的模型中。具体来说,\ textbf {\ emph {(i)}} dafa首先提出了一个对结构敏感的范式,以构造一个依赖关系矩阵,以校准注意力重量。它采用自适应融合模块来整合所获得的依赖性信息和原始语义信号。此外,DAFA重建了注意力计算流,并提供了更好的解释性。通过将其应用于BERT,我们的方法在10个公共数据集上实现了最先进的或竞争性能,从而证明了语义匹配任务中适应性融合的依赖性结构的好处。
Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.