论文标题
无监督的句子与作曲短语语义的文字相似性
Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics
论文作者
论文摘要
测量句子的文本相似性(STS)是一项经典任务,可以应用于许多下游NLP应用程序,例如文本生成和检索。在本文中,我们专注于无监督的STS,该STS在各个领域都可以使用,但仅需要最少的数据和计算资源。从理论上讲,我们为STS计算提出了一种轻加权的期望校正(EC)公式。 EC公式统一了无监督的STS方法,包括添加性组成(AC)句子嵌入,最佳传输(OT)和树核(TK)的余弦相似性。此外,我们提出了递归的最佳运输相似性(ROTS)算法,以通过组成多个递归EC制剂来捕获组成短语语义。腐烂以线性时间结束,并且比其前任更快。与以前的方法相比,ROTS在经验上更有效和可扩展。在各种设置下的29个STS任务上进行了广泛的实验,显示了腐烂的明显优势,而不是现有方法。详细的消融研究证明了我们方法的有效性。
Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies demonstrate the effectiveness of our approaches.