句子值得128个伪令牌：句子嵌入的语义意识对比学习框架

论文标题

句子值得128个伪令牌：句子嵌入的语义意识对比学习框架

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

论文作者

Tan, Haochen, Shao, Wei, Wu, Han, Yang, Ke, Song, Linqi

论文摘要

对比学习在无监督的嵌入任务中显示出很大的潜力，例如Simcse。但是，我们发现这些现有的解决方案受到浅表特征（例如句子长度或句法结构）的严重影响。在本文中，我们提出了一个句子嵌入语义的对比学习框架，称为伪token bert（pt-bert），该框架能够利用伪token空间（即潜在的语义空间）的句子表示句子的表现，同时消除了句子的影响，例如句子长度和句子的影响。具体而言，我们引入了一个独立于Bert编码器的附加伪令牌嵌入层，以将每个句子映射到固定长度的伪代币序列中。利用这些伪序列，我们能够根据注意力机制来构建相同的正长和负面对，以执行对比度学习。此外，我们同时利用梯度的升级和势头更高的编码器来编码实例，同时动态维护额外的队列来存储句子嵌入的表示，从而增强了编码器的学习性能，以实现负面示例。实验表明，我们的模型优于六个标准语义文本相似性（STS）任务的最先进基线。此外，对一致性和统一性损失的实验以及具有不同句子长度和语法的硬示例，一致验证我们方法的有效性。

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform contrastive learning. In addition, we utilize both the gradient-updating and momentum-updating encoders to encode instances while dynamically maintaining an additional queue to store the representation of sentence embeddings, enhancing the encoder's learning performance for negative examples. Experiments show that our model outperforms the state-of-the-art baselines on six standard semantic textual similarity (STS) tasks. Furthermore, experiments on alignments and uniformity losses, as well as hard examples with different sentence lengths and syntax, consistently verify the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题