论文标题

语法增强的预训练模型

Syntax-Enhanced Pre-trained Model

论文作者

Xu, Zenan, Guo, Daya, Tang, Duyu, Su, Qinliang, Shou, Linjun, Gong, Ming, Zhong, Wanjun, Quan, Xiaojun, Duan, Nan, Jiang, Daxin

论文摘要

我们研究了利用文本的句法结构以增强伯特和罗伯塔等预训练模型的问题。现有方法在训练阶段或微调阶段使用文本语法,以便它们在两个阶段之间存在差异。这样的问题将导致有必要拥有人类宣传的句法信息,这将现有方法应用于更广泛的方案。为了解决这个问题,我们提出了一个模型,该模型在预训练和微调阶段都利用文本语法。我们的模型基于变压器,其语法感知注意力层考虑了文本的依赖性树。我们进一步介绍了一项新的预训练任务,以预测依赖树中令牌之间的句法距离。我们在三个下游任务上评估模型,包括关系分类,实体键入和问题答案。结果表明,我们的模型在六个公共基准数据集上实现了最先进的性能。我们有两个主要发现。首先,我们证明,自动产生文本语法可以改善预训练的模型。其次,与连续令牌之间的局部头部关系相比,令牌之间的全球句法距离带来了更大的性能增长。

We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing, and question answering. Results show that our model achieves state-of-the-art performance on six public benchmark datasets. We have two major findings. First, we demonstrate that infusing automatically produced syntax of text improves pre-trained models. Second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源