论文标题

双对比度学习用于在线聚类

Twin Contrastive Learning for Online Clustering

论文作者

Li, Yunfan, Yang, Mouxing, Peng, Dezhong, Li, Taihao, Huang, Jiantao, Peng, Xi

论文摘要

本文建议通过在实例和集群级别进行双对比度学习(TCL)来执行在线聚类。具体而言,我们发现,当数据被投影到具有目标群集号的维度的特征空间中时,其特征矩阵的行和列分别对应于实例和群集表示。基于观察结果,对于给定的数据集,提出的TCL首先通过数据增强构建正面和负对。此后,在特征矩阵的行和列空间中,实例和群集级对比度学习分别是通过将正面对进行的,同时将负极分开来进行。为了减轻固有的假阴性对的影响并纠正集群分配,我们采用了基于置信的标准来选择伪标记,以增强实例和集群级的对比度学习。结果,聚类性能得到进一步提高。除了双胞胎对比学习的优雅思想外,TCL的另一个优点是,它可以独立预测每个实例的群集分配,从而毫不费力地拟合在线场景。对六个广泛使用的图像和文本基准进行了广泛的实验证明了TCL的有效性。该代码将在Github上发布。

This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code will be released on GitHub.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源