改变神经网络的视觉表示以预测人类相似性的判断

论文标题

改变神经网络的视觉表示以预测人类相似性的判断

Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

论文作者

Attarian, Maria, Roads, Brett D., Mozer, Michael C.

论文摘要

深度学习的视觉模型显示出有关人类视力的有趣相似性和差异。我们研究了如何使机器视觉表示与人类代表的更好对齐。人类表示通常是从行为证据中推断出来的，例如选择与查询图像最相似的图像。我们发现，通过对深层嵌入的适当线性转换，我们可以将人类二元选择的预测在基线时的72％提高到89％。我们假设深层嵌入具有冗余，高（4096）的尺寸表示。但是，降低这些表示形式的等级会导致解释力的损失。我们假设过去研究中探讨的表示的扩张转换太过限制了，实际上，我们发现，通过更具表现力的线性变换，模型的解释能力可以显着改善。最令人惊讶和令人兴奋的是，我们发现，与经典的心理学文学一致，人类的相似性判断是不对称的：X与Y的相似性不一定等于Y与X的相似性，并且允许模型表达这种不对称性提高了解释力。

Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how to bring machine visual representations into better alignment with human representations. Human representations are often inferred from behavioral evidence such as the selection of an image most similar to a query image. We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice on a data set of bird images from 72% at baseline to 89%. We hypothesized that deep embeddings have redundant, high (4096) dimensional representations; however, reducing the rank of these representations results in a loss of explanatory power. We hypothesized that the dilation transformation of representations explored in past research is too restrictive, and indeed we found that model explanatory power can be significantly improved with a more expressive linear transform. Most surprising and exciting, we found that, consistent with classic psychological literature, human similarity judgments are asymmetric: the similarity of X to Y is not necessarily equal to the similarity of Y to X, and allowing models to express this asymmetry improves explanatory power.

下载PDF全文

下载文献需遵守相关版权规定

论文标题