仅使用单词嵌入的158种语言的单词感觉歧义

论文标题

仅使用单词嵌入的158种语言的单词感觉歧义

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

论文作者

Logacheva, Varvara, Teslenko, Denis, Shelmanov, Artem, Remus, Steffen, Ustalov, Dmitry, Kutuzov, Andrey, Artemova, Ekaterina, Biemann, Chris, Ponzetto, Simone Paolo, Panchenko, Alexander

论文摘要

对于人类而言，在上下文中对单词感官的歧义很容易，但对于自动方法来说是一个主要的挑战。开发了复杂的监督和基于知识的模型来解决这项任务。但是，（i）给定单词和/或（ii）语言知识表示质量的固有的Zipfian分布促进了完全无监督和无知识的方法来开发单词感觉歧义（WSD）。它们对于资源不足的语言特别有用，这些语言没有任何资源来构建受监督和/或基于知识的模型。在本文中，我们提出了一种方法，该方法将作为输入标准的预训练单词嵌入模型并诱导了符合人才的单词sense contingory，可用于上下文中的歧义。我们使用这种方法根据Grave等人的原始预训练的FastText单词嵌入来诱导158种语言的感官库存集合。（2018），以这些语言启用WSD。模型和系统可在线提供。

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题