学习弱监督的对比表示

论文标题

学习弱监督的对比表示

Learning Weakly-Supervised Contrastive Representations

论文作者

Tsai, Yao-Hung Hubert, Li, Tianqin, Liu, Weixin, Liao, Peiyuan, Salakhutdinov, Ruslan, Morency, Louis-Philippe

论文摘要

我们认为，辅助信息提供的有价值信息的一种形式是其隐含的数据聚类信息。例如，将主题标签视为辅助信息，我们可以假设Instagram图像在语义上与同一主题标签更为相似。通过这种直觉，我们提出了一种两阶段的弱监督对比学习方法。第一阶段是根据其辅助信息聚集数据。第二阶段是在同一集群中学习类似的表示，以及来自不同群集的数据的不同表示形式。我们的经验实验提出了以下三项贡献。首先，与常规的自我监督表示相比，辅助信息注册的表示形式使性能更接近监督表示，这些表示使用直接下游标签作为监督信号。其次，在大多数情况下，我们的方法在将我们的方法与其他基线表示方法进行比较时，也可以利用辅助数据信息。第三，我们表明我们的方法在无监督的构造群集（例如，没有辅助信息）中也很好地工作，从而实现了强大的无监督表示学习方法。

We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stage is to cluster data according to its auxiliary information. The second stage is to learn similar representations within the same cluster and dissimilar representations for data from different clusters. Our empirical experiments suggest the following three contributions. First, compared to conventional self-supervised representations, the auxiliary-information-infused representations bring the performance closer to the supervised representations, which use direct downstream labels as supervision signals. Second, our approach performs the best in most cases, when comparing our approach with other baseline representation learning methods that also leverage auxiliary data information. Third, we show that our approach also works well with unsupervised constructed clusters (e.g., no auxiliary information), resulting in a strong unsupervised representation learning approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题