论文标题

审计句子嵌入的相互加固框架

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

论文作者

Yang, Junhan, Liu, Zheng, Xiao, Shitao, Lian, Jianxun, Wu, Lijun, Lian, Defu, Sun, Guangzhong, Xie, Xing

论文摘要

缺乏标记的数据是学习高质量句子嵌入的主要障碍。最近,自我监督的对比学习(SCL)被视为解决此问题的一种有希望的方法。但是,现有的作品主要依靠手工制作的数据注释启发式方法来产生积极的培训样本,这不仅要求域专业知识和费力地调整,而且还容易发生以下不利的情况:1)琐碎的积极效应,2)粗粒度的阳性和3)虚假的正面。结果,自我实施者的质量在现实中可能受到严重限制。在这项工作中,我们提出了一个新颖的框架Infocse来解决上述问题。它不依赖于人类定义的注释启发式方法,而是利用句子表示模型本身并意识到以下迭代自我划分的过程:一方面,句子表示的改进可能有助于数据注释的质量;另一方面,更有效的数据注释有助于生成高质量的正样本,这将进一步改善当前的句子表示模型。换句话说,表示和数据注释可以相互加强,在这种情况下可以得出强大的自我判断效应。广泛的实验是根据三个基准数据集执行的,可以根据现有的基于SCL的方法实现显着的改进。

The lack of labeled data is a major obstacle to learning high-quality sentence embeddings. Recently, self-supervised contrastive learning (SCL) is regarded as a promising way to address this problem. However, the existing works mainly rely on hand-crafted data annotation heuristics to generate positive training samples, which not only call for domain expertise and laborious tuning, but are also prone to the following unfavorable cases: 1) trivial positives, 2) coarse-grained positives, and 3) false positives. As a result, the self-supervision's quality can be severely limited in reality. In this work, we propose a novel framework InfoCSE to address the above problems. Instead of relying on annotation heuristics defined by humans, it leverages the sentence representation model itself and realizes the following iterative self-supervision process: on one hand, the improvement of sentence representation may contribute to the quality of data annotation; on the other hand, more effective data annotation helps to generate high-quality positive samples, which will further improve the current sentence representation model. In other words, the representation learning and data annotation become mutually reinforced, where a strong self-supervision effect can be derived. Extensive experiments are performed based on three benchmark datasets, where notable improvements can be achieved against the existing SCL-based methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源