论文标题
Schme在Semeval-2020任务1:用于检测词汇语义变化的模型合奏
SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change
论文作者
论文摘要
本文介绍了Schme(使用模型集合的语义变化检测),这是一种用于词汇语义变化无监督检测的Semeval-2020任务1。 Schme使用模型集合结合了分布模型(单词嵌入式)和文字频率模型的信号,其中每个模型都投票表明单词根据该功能而遭受了语言变化的概率。更具体地说,我们将单词向量的余弦距离与我们命名为映射的邻居距离(MAP)的基于邻域度量的余弦距离,并将单词频率差异度量度量作为我们模型的输入信号。此外,我们探讨了基于一致性的方法来研究此过程中使用的地标的重要性。我们的结果表明,用于对齐的地标的数量对模型的预测性能有直接影响。此外,我们表明,无数痛苦的语义变化的语言往往会受益于使用大量地标,而语言随着语言变化的更多语言而受益于更仔细的地标数字来对齐。
This paper describes SChME (Semantic Change Detection with Model Ensemble), a method usedin SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature. More specifically, we combine cosine distance of wordvectors combined with a neighborhood-based metric we named Mapped Neighborhood Distance(MAP), and a word frequency differential metric as input signals to our model. Additionally,we explore alignment-based methods to investigate the importance of the landmarks used in thisprocess. Our results show evidence that the number of landmarks used for alignment has a directimpact on the predictive performance of the model. Moreover, we show that languages that sufferless semantic change tend to benefit from using a large number of landmarks, whereas languageswith more semantic change benefit from a more careful choice of landmark number for alignment.