Schme在Semeval-2020任务1：用于检测词汇语义变化的模型合奏

论文标题

Schme在Semeval-2020任务1：用于检测词汇语义变化的模型合奏

SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change

论文作者

Gruppi, Maurício, Adali, Sibel, Chen, Pin-Yu

论文摘要

本文介绍了Schme（使用模型集合的语义变化检测），这是一种用于词汇语义变化无监督检测的Semeval-2020任务1。 Schme使用模型集合结合了分布模型（单词嵌入式）和文字频率模型的信号，其中每个模型都投票表明单词根据该功能而遭受了语言变化的概率。更具体地说，我们将单词向量的余弦距离与我们命名为映射的邻居距离（MAP）的基于邻域度量的余弦距离，并将单词频率差异度量度量作为我们模型的输入信号。此外，我们探讨了基于一致性的方法来研究此过程中使用的地标的重要性。我们的结果表明，用于对齐的地标的数量对模型的预测性能有直接影响。此外，我们表明，无数痛苦的语义变化的语言往往会受益于使用大量地标，而语言随着语言变化的更多语言而受益于更仔细的地标数字来对齐。

This paper describes SChME (Semantic Change Detection with Model Ensemble), a method usedin SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature. More specifically, we combine cosine distance of wordvectors combined with a neighborhood-based metric we named Mapped Neighborhood Distance(MAP), and a word frequency differential metric as input signals to our model. Additionally,we explore alignment-based methods to investigate the importance of the landmarks used in thisprocess. Our results show evidence that the number of landmarks used for alignment has a directimpact on the predictive performance of the model. Moreover, we show that languages that sufferless semantic change tend to benefit from using a large number of landmarks, whereas languageswith more semantic change benefit from a more careful choice of landmark number for alignment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题