用于测量学到的均衡性的谎言衍生物

论文标题

用于测量学到的均衡性的谎言衍生物

The Lie Derivative for Measuring Learned Equivariance

论文作者

Gruver, Nate, Finzi, Marc, Goldblum, Micah, Wilson, Andrew Gordon

论文摘要

模棱两可保证模型的预测捕获数据中的关键对称性。当图像翻译或旋转时，该图像的模型表示该图像的表示形式将相应地翻译或旋转。历史上，卷积神经网络的成功与直接在其体系结构中进行编码的翻译模糊息息相关。视觉变压器的成功不断提高，没有明确的建筑偏见对模棱两可的挑战，它挑战了这一叙述，并表明增强和培训数据也可能在其性能中发挥重要作用。为了更好地理解均衡性在最近的视觉模型中的作用，我们引入了Lie derivative，这是一种测量与强大数学基础和最小超级参数的方法的方法。使用谎言衍生物，我们研究了数百种预验证模型，跨越CNN，变压器和混音器体系结构的均衡性能。我们的分析规模使我们能够将体系结构与模型大小或培训方法等其他因素区分开。令人惊讶的是，我们发现许多违反均衡性的行为可以与无处不在的网络层中的空间混叠联系在一起，例如非线性的非线性性，并且随着模型变得更大，更准确，它们倾向于表现出更大的均衡性。例如，训练后，变形金刚比卷积神经网络更像是等等的。

Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, which have no explicit architectural bias towards equivariance, challenges this narrative and suggests that augmentations and training data might also play a significant role in their performance. In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. Using the Lie derivative, we study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. The scale of our analysis allows us to separate the impact of architecture from other factors like model size or training method. Surprisingly, we find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities, and that as models get larger and more accurate they tend to display more equivariance, regardless of architecture. For example, transformers can be more equivariant than convolutional neural networks after training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题