一个摩尔的困境：快捷方式出现在倍数中，使一个人放大其他

论文标题

一个摩尔的困境：快捷方式出现在倍数中，使一个人放大其他

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

论文作者

Li, Zhiheng, Evtimov, Ivan, Gordo, Albert, Hazirbas, Caner, Hassner, Tal, Ferrer, Cristian Canton, Xu, Chenliang, Ibrahim, Mark

论文摘要

已经发现机器学习模型可以学习捷径 - 意想不到的决策规则无法概括 - 破坏了模型的可靠性。以前的工作是在训练数据中仅存在一个快捷方式的微不足道的假设下解决了这个问题。现实世界的图像充满了从背景到纹理的多个视觉提示。促进视觉系统的可靠性的关键是了解现有方法是否可以克服多个捷径或挣扎的摩尔游戏，即减轻一种快捷方式会扩大对其他方法的依赖。为了解决这一缺点，我们提出了两个基准：1）URBANCARS，一个具有精确控制的伪造线索的数据集，以及2）Imagenet-W，Imagenet-W，这是基于Imagenet的评估集，用于Watermark的Imagenet，我们发现的快捷方式会影响几乎每个现代视觉模型。除质地和背景外，Imagenet-W还使我们能够研究自然图像训练中出现的多个快捷方式。我们发现计算机视觉模型，包括大型基础模型 - 无论训练集，建筑和监督如何，都在存在多个快捷方式时挣扎。甚至是针对差异的难题中抗击快捷方式的明确设计的方法。为了应对这一挑战，我们提出了最后一层集合，这是一种简单的有效方法，可以减轻无摩尔行为的多个快捷方式。我们的结果表现出多短速度缓解，这是提高视觉系统可靠性至关重要的挑战。数据集和代码已发布：https：//github.com/facebookresearch/whac-a-mole。

Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.

下载PDF全文

下载文献需遵守相关版权规定

论文标题