BYOL即使没有批处理统计

论文标题

BYOL即使没有批处理统计

BYOL works even without batch statistics

论文作者

Richemond, Pierre H., Grill, Jean-Bastien, Altché, Florent, Tallec, Corentin, Strub, Florian, Brock, Andrew, Smith, Samuel, De, Soham, Pascanu, Razvan, Piot, Bilal, Valko, Michal

论文摘要

Bootstrap您自己的潜在（BYOL）是一种自我监督的学习方法，用于图像表示。从图像的增强视图中，BYOL训练在线网络，以预测同一图像的不同增强视图的目标网络表示。与对比方法不同，BYOL不会在其训练目标中明确使用由负面对构建的排斥术语。然而，它避免了崩溃到琐碎的，恒定的表示。因此，最近假设批处理（BN）对于防止BYOL崩溃至关重要。实际上，BN会在批处理元素中流动梯度，并可能泄漏有关批次中负面视图的信息，这可能是隐式的负（对比）术语。但是，我们从实验上表明，用独立于批归于的标准化方案代替BN（即组归一化和体重标准化的组合）实现了与Vanilla byol媲美的性能（$ 73.9 \％$ vs. $ 74.3 \％$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 50 $ 50 $ 50 $ 50。我们的发现反驳了以下假设：使用批处理统计是BYOL学习有用表示的至关重要的成分。

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题