论文标题

中文单词分割与异构图神经网络

Chinese Word Segmentation with Heterogeneous Graph Neural Network

论文作者

Tang, Xuemei, Wang, Jun, Su, Qi

论文摘要

近年来,深度学习在中文单词细分(CWS)任务中取得了重大成功。这些方法中的大多数通过利用外部信息,例如单词,子字,语法来提高CW的性能。但是,现有方法无法有效整合多级语言信息,也忽略了外部信息的结构特征。因此,在本文中,我们提出了一个框架来改善CWS,名为HGNSEG。它通过预先训练的语言模型和异质图神经网络充分利用了多级外部信息。在六个基准数据集(例如Bakeoff 2005,Bakeoff 2008)上进行的实验结果验证了我们的方法可以有效地改善中文单词分割的性能。重要的是,在跨域场景中,我们的方法还表现出强大的能力减轻毒素外(OOV)问题。

In recent years, deep learning has achieved significant success in the Chinese word segmentation (CWS) task. Most of these methods improve the performance of CWS by leveraging external information, e.g., words, sub-words, syntax. However, existing approaches fail to effectively integrate the multi-level linguistic information and also ignore the structural feature of the external information. Therefore, in this paper, we proposed a framework to improve CWS, named HGNSeg. It exploits multi-level external information sufficiently with the pre-trained language model and heterogeneous graph neural network. The experimental results on six benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008) validate that our approach can effectively improve the performance of Chinese word segmentation. Importantly, in cross-domain scenarios, our method also shows a strong ability to alleviate the out-of-vocabulary (OOV) problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源