论文标题
混合:使用嵌入的混合物无监督的图像分类
MIX'EM: Unsupervised Image Classification using a Mixture of Embeddings
论文作者
论文摘要
我们提出Mix'em,这是一种用于无监督图像分类的新颖解决方案。 Mix'em生成的表示形式本身足以驱动通用聚类算法以提供高质量的分类。这是通过将嵌入模块的混合物构造成对比的视觉表示学习框架来完成的,以便在类别级别上删除表示形式。它首先从给定的视觉表示中生成一组嵌入和混合系数,然后将它们组合成单个嵌入。我们介绍了三种技术来成功训练Mix'em并避免退化解决方案; (i)通过最大化熵,(ii)最大程度地减少实例的组件熵来实现聚类的嵌入空间,(iii)使用关联嵌入损失来强制使用语义分离性来实现多样化。通过应用(i)和(ii),通过混合系数出现语义类别,使得可以应用(iii)。随后,我们在表示语义分类的表示上运行K-均值。我们对STL10,CIFAR10和CIFAR100-20数据集进行了广泛的实验和分析,分别达到78 \%,82 \%和44 \%的最新分类精度。为了实现稳健而高精度,必须使用混合组件初始化K-均值。最后,我们报告通过将K-均值应用于使用对比度损失学习的“归一化”表示获得的竞争基准(STL10上的70 \%)。
We present MIX'EM, a novel solution for unsupervised image classification. MIX'EM generates representations that by themselves are sufficient to drive a general-purpose clustering algorithm to deliver high-quality classification. This is accomplished by building a mixture of embeddings module into a contrastive visual representation learning framework in order to disentangle representations at the category level. It first generates a set of embedding and mixing coefficients from a given visual representation, and then combines them into a single embedding. We introduce three techniques to successfully train MIX'EM and avoid degenerate solutions; (i) diversify mixture components by maximizing entropy, (ii) minimize instance conditioned component entropy to enforce a clustered embedding space, and (iii) use an associative embedding loss to enforce semantic separability. By applying (i) and (ii), semantic categories emerge through the mixture coefficients, making it possible to apply (iii). Subsequently, we run K-means on the representations to acquire semantic classification. We conduct extensive experiments and analyses on STL10, CIFAR10, and CIFAR100-20 datasets, achieving state-of-the-art classification accuracy of 78\%, 82\%, and 44\%, respectively. To achieve robust and high accuracy, it is essential to use the mixture components to initialize K-means. Finally, we report competitive baselines (70\% on STL10) obtained by applying K-means to the "normalized" representations learned using the contrastive loss.