面具的重要性：迈向蒙版自动编码器的理论理解

论文标题

面具的重要性：迈向蒙版自动编码器的理论理解

How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders

论文作者

Zhang, Qi, Wang, Yifei, Wang, Yisen

论文摘要

基于重建任务的蒙版自动编码器（MAE）已成为自我监督学习（SSL）的有希望的范式，并在不同的基准数据集中实现了最先进的性能。但是，尽管取得了令人印象深刻的经验成功，但对其的理论理解仍然有限。在本文中，我们提出了对MAE如何学习有意义的特征的理论理解。我们在MAE和对比度学习之间建立了密切的联系，这表明MAE隐式与面膜诱导的阳性对保持一致。基于这种联系，我们为MAE方法开发了第一个下游保证，并分析了掩模比的效果。此外，由于隐性对齐，我们还指出了MAE的维度崩溃问题，并提出了一个均匀性增强的MAE（U-MAE）损失，该损失可以有效地解决此问题，并在包括CIFAR-10，Imagenet-100，Imagenet-100和Imagenet-1k上对现实世界中的数据集进行了重大改进。代码可在（https://github.com/zhangq327/u-mae）上找到。

Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE).

下载PDF全文

下载文献需遵守相关版权规定

论文标题