论文标题
面具的重要性:迈向蒙版自动编码器的理论理解
How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
论文作者
论文摘要
基于重建任务的蒙版自动编码器(MAE)已成为自我监督学习(SSL)的有希望的范式,并在不同的基准数据集中实现了最先进的性能。但是,尽管取得了令人印象深刻的经验成功,但对其的理论理解仍然有限。在本文中,我们提出了对MAE如何学习有意义的特征的理论理解。我们在MAE和对比度学习之间建立了密切的联系,这表明MAE隐式与面膜诱导的阳性对保持一致。基于这种联系,我们为MAE方法开发了第一个下游保证,并分析了掩模比的效果。此外,由于隐性对齐,我们还指出了MAE的维度崩溃问题,并提出了一个均匀性增强的MAE(U-MAE)损失,该损失可以有效地解决此问题,并在包括CIFAR-10,Imagenet-100,Imagenet-100和Imagenet-1k上对现实世界中的数据集进行了重大改进。代码可在(https://github.com/zhangq327/u-mae)上找到。
Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE).