论文标题

分析单词嵌入稳定性跨语言的惊人变异性

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

论文作者

Burdick, Laura, Kummerfeld, Jonathan K., Mihalcea, Rada

论文摘要

单词嵌入是强大的表示形式,构成了英语和其他语言的许多自然语言处理体系结构的基础。为了进一步了解单词嵌入,我们探索了它们的稳定性(例如,在不同嵌入空间中一个单词的最近邻居之间重叠)。我们讨论与稳定性相关的语言属性,引发有关与粘附,语言性别系统和其他特征相关的见解。这对嵌入使用有影响,特别是在使用它们来研究语言趋势的研究中。

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源