论文标题
使用多语言和多模式信息的Babelnet合成器的半决赛预测
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information
论文作者
论文摘要
在语言学中,将半eme定义为语言的最低语义单元。通过手动注释单词的半知识库(KB)已成功应用于各种NLP任务。但是,现有的Sememe KB仅涵盖了几种语言,这阻碍了Sememes的广泛利用。为了解决这个问题,提出了对Babelnet Synsets(SPB)的半半预测任务,旨在建立一个基于Babelnet的多语言Sememe KB,这是一本多语言百科全书词典。通过自动预测Babelnet综合的半决赛,同步中许多语言中的单词将同时获得半注释。但是,以前的SPB方法尚未充分利用Babelnet中丰富的信息。在本文中,我们利用了Babelnet中的多语言同义词,多语言的光泽和图像。我们设计了一个多模式信息融合模型,以编码和组合此信息以进行半主预测。实验结果表明,与以前的方法相比,我们的模型的表现较高(大约10个地图和F1分数)。本文的所有代码和数据都可以在https://github.com/thunlp/msgi上获得。
In linguistics, a sememe is defined as the minimum semantic unit of languages. Sememe knowledge bases (KBs), which are built by manually annotating words with sememes, have been successfully applied to various NLP tasks. However, existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes. To address this issue, the task of sememe prediction for BabelNet synsets (SPBS) is presented, aiming to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary. By automatically predicting sememes for a BabelNet synset, the words in many languages in the synset would obtain sememe annotations simultaneously. However, previous SPBS methods have not taken full advantage of the abundant information in BabelNet. In this paper, we utilize the multilingual synonyms, multilingual glosses and images in BabelNet for SPBS. We design a multimodal information fusion model to encode and combine this information for sememe prediction. Experimental results show the substantial outperformance of our model over previous methods (about 10 MAP and F1 scores). All the code and data of this paper can be obtained at https://github.com/thunlp/MSGI.