更宽且近距离：零拍的短渠道蒸馏器的混合物，命名为“实体识别”

论文标题

更宽且近距离：零拍的短渠道蒸馏器的混合物，命名为“实体识别”

WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition

论文作者

Ma, Jun-Yu, Chen, Beiduo, Gu, Jia-Chen, Ling, Zhen-Hua, Guo, Wu, Liu, Quan, Chen, Zhigang, Liu, Cong

论文摘要

零射击的跨语言命名实体识别（NER）旨在将知识从源语言的注释和丰富的资源数据转移到目标语言的未标记和精益资源数据。基于教师蒸馏框架的现有主流方法忽略了位于预训练的语言模型的中间层中的丰富和互补信息，并且在转移过程中很容易丢失域不变的信息。在这项研究中，提出了简短通道蒸馏器（MSD）方法的混合物，以充分相互作用在教师模型中与丰富的层次结构信息相互作用，并充分有效地将知识转移到学生模型中。具体而言，多通道蒸馏框架设计用于通过将多个蒸馏器作为混合物聚集来进行足够的信息传输。此外，提出了一种无监督的方法，该方法采用了平行域的适应性，以缩短教师和学生模型之间的渠道，以保留域不变特征。在九种语言的四个数据集上进行的实验表明，所提出的方法在零拍的跨语言NER上实现了新的最先进的性能，并在语言和字段上显示出巨大的概括和兼容性。

Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.

下载PDF全文

下载文献需遵守相关版权规定

论文标题