通过教师学生学习无标记的数据，单语/多源跨语言NER

论文标题

通过教师学生学习无标记的数据，单语/多源跨语言NER

Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language

论文作者

Wu, Qianhui, Lin, Zijia, Karlsson, Börje F., Lou, Jian-Guang, Huang, Biqing

论文摘要

为了更好地解决少量/无标记数据的语言上指定的实体识别问题（NER）问题，跨语言NER必须有效利用具有丰富标记数据的源语言中学到的知识。以前关于跨语言NER的作品主要基于具有成对文本或直接模型传输的标签投影。但是，如果无法使用源语言的标记数据，或者不利用目标语言中未标记的数据中包含的信息，则此类方法不适用。在本文中，我们提出了一种教师学习方法来解决此类局限性，其中使用源语言中的NER模型用作教师来培训学生模型的目标语言，以无标记的数据进行培训。所提出的方法适用于单源和多源跨语言NER。对于后者，我们进一步提出了一种相似性测量方法，以更好地加重不同教师模型的监督。基准数据集上3种目标语言的广泛实验很好地表明，我们的方法优于单源和多源跨语言NER的现有最新方法。

To better tackle the named entity recognition (NER) problem on languages with little/no labeled data, cross-lingual NER must effectively leverage knowledge learned from source languages with rich labeled data. Previous works on cross-lingual NER are mostly based on label projection with pairwise texts or direct model transfer. However, such methods either are not applicable if the labeled data in the source languages is unavailable, or do not leverage information contained in unlabeled data in the target language. In this paper, we propose a teacher-student learning method to address such limitations, where NER models in the source languages are used as teachers to train a student model on unlabeled data in the target language. The proposed method works for both single-source and multi-source cross-lingual NER. For the latter, we further propose a similarity measuring method to better weight the supervision from different teacher models. Extensive experiments for 3 target languages on benchmark datasets well demonstrate that our method outperforms existing state-of-the-art methods for both single-source and multi-source cross-lingual NER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题