论文标题
Augcse:嵌入不同扩增的对比度句子
AugCSE: Contrastive Sentence Embedding with Diverse Augmentations
论文作者
论文摘要
数据增强技术已被证明在NLP字段中的许多应用中有用。大多数增强是特定于任务的,不能用作通用工具。在我们的工作中,我们提出了Augcse,这是一个统一的框架,用于利用各种数据增强集以实现更好,通用,句子嵌入模型。在最新句子嵌入模型的基础上,我们的方法使用了一个简单的拮抗歧视器,可以区分增强类型。随着从域的适应中借用的填充目标,我们表明可以驯服各种相互矛盾的对比信号的增强物,从而产生更好,更强大的句子表示。我们的方法仅使用无监督的数据来实现下游传输任务的最新结果,并在语义文本相似性任务上执行竞争性。
Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize diverse sets of data augmentations to achieve a better, general purpose, sentence embedding model. Building upon the latest sentence embedding models, our approach uses a simple antagonistic discriminator that differentiates the augmentation types. With the finetuning objective borrowed from domain adaptation, we show that diverse augmentations, which often lead to conflicting contrastive signals, can be tamed to produce a better and more robust sentence representation. Our methods achieve state-of-the-art results on downstream transfer tasks and perform competitively on semantic textual similarity tasks, using only unsupervised data.