论文标题
视觉表示学习中的概念概括
Concept Generalization in Visual Representation Learning
论文作者
论文摘要
测量概念概括的概括,即,可以利用一组(可见的)视觉概念训练的模型来识别一组新的(看不见的)概念,是评估视觉表征的一种流行方式,尤其是在一个自我监督的学习框架中。尽管如此,对于这种评估的看不见概念通常是任意进行的,并且独立于训练表示形式的所见概念,从而忽略了两者之间的任何语义关系。在本文中,我们认为,可见和看不见的概念之间的语义关系影响了概括性能,并提出了Imagenet-Cog,这是Imagenet-Cog,这是Imagenet-21K(IN-21K)数据集的新颖基准,该基准可以以原则性的方式测量概念概括。我们的基准测试利用WordNet的专家知识来定义一系列未见的IN-21K概念集,这些概念集在语义上越来越远离Imagenet-1K(In-1k)子集,这是一个无处不在的训练集。这使我们能够基准在1K内学到的视觉表示。我们进行了一项大规模研究,其中包括31个卷积和基于变压器的模型,并展示了不同的体系结构,监督水平,正规化技术和Web数据的使用如何影响概念概括性能。
Measuring concept generalization, i.e., the extent to which models trained on a set of (seen) visual concepts can be leveraged to recognize a new set of (unseen) concepts, is a popular way of evaluating visual representations, especially in a self-supervised learning framework. Nonetheless, the choice of unseen concepts for such an evaluation is usually made arbitrarily, and independently from the seen concepts used to train representations, thus ignoring any semantic relationships between the two. In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way. Our benchmark leverages expert knowledge that comes from WordNet in order to define a sequence of unseen IN-21K concept sets that are semantically more and more distant from the ImageNet-1K (IN-1K) subset, a ubiquitous training set. This allows us to benchmark visual representations learned on IN-1K out-of-the box. We conduct a large-scale study encompassing 31 convolution and transformer-based models and show how different architectures, levels of supervision, regularization techniques and use of web data impact the concept generalization performance.