论文标题
在单词用法中捕获进化:只需添加更多簇吗?
Capturing Evolution in Word Usage: Just Add More Clusters?
论文作者
论文摘要
单词的使用方式随着时间的流逝而演变,反映了社会的文化或技术演变。语义变化检测是即使在短时间内,在文本数据中检测和分析单词演变的任务。在本文中,我们着重于依靠上下文化嵌入的一系列新方法,这是一种彻底改变NLP字段的语义建模。我们利用基于变压器的BERT模型生成能够在时间上检测单词语义变化的上下文嵌入的能力。在共同的环境中比较了几种方法,以建立每个方法的优势和劣势。我们还提出了一些改进的想法,以大大提高现有方法的性能。
The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is the task of detecting and analysing word evolution in textual data, even in short periods of time. In this paper we focus on a new set of methods relying on contextualised embeddings, a type of semantic modelling that revolutionised the NLP field recently. We leverage the ability of the transformer-based BERT model to generate contextualised embeddings capable of detecting semantic change of words across time. Several approaches are compared in a common setting in order to establish strengths and weaknesses for each of them. We also propose several ideas for improvements, managing to drastically improve the performance of existing approaches.