论文标题
一种新颖的功能杂交方法揭示了卷积神经网络学习空间关系的能力
A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations
论文作者
论文摘要
卷积神经网络(CNN)是解决对象识别的最成功的计算机视觉系统之一。此外,CNN在理解人脑视觉表示的性质方面具有主要应用。然而,CNNS实际上是如何做出决定,其内部表征的本质以及他们的认可策略与人类的不同之处仍然很糟糕的。具体来说,关于CNN是否主要依赖对象的表面规律性,或者它们是否能够利用特征的空间布置(类似于人类)的问题。在这里,我们开发了一种新颖的功能杂交方法,以明确测试CNN是否使用特征的空间布置(即对象零件)对对象进行分类。我们将这种方法与对CNN的有效接受场大小以及最小识别构型(MIRC)分析的系统操纵结合在一起。与以前的许多文献相反,我们提供了证据,表明CNN实际上能够使用相对较长的空间关系进行对象分类。此外,CNN使用空间关系的程度在很大程度上取决于数据集,例如纹理与草图。实际上,CNN甚至在异质数据集中的不同类别(Imagenet)中使用不同的策略,这表明CNN具有连续的分类策略。最后,我们表明,CNNS仅在粒度的中间水平中学习特征的空间布置,这表明中间而不是全球形状的特征为对象分类中的灵敏度和特异性提供了最佳的权衡。这些结果提供了对CNN表示性质以及它们依赖于物体分类特征空间排列的程度的新见解。
Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.