上下文自动编码器，用于自我监督的代表学习

论文标题

上下文自动编码器，用于自我监督的代表学习

Context Autoencoder for Self-Supervised Representation Learning

论文作者

Chen, Xiaokang, Ding, Mingyu, Wang, Xiaodi, Xin, Ying, Mo, Shentong, Wang, Yunhao, Han, Shumin, Luo, Ping, Zeng, Gang, Wang, Jingdong

论文摘要

我们提出了一种新颖的掩盖图像建模（MIM）方法，即上下文自动编码器（CAE），以预处理自我监督的表示。我们通过在编码的表示空间中进行预测来预测编码器。训练预处理的任务包括两个任务：蒙版表示预测 - 预测蒙版贴片的表示形式，以及蒙版的补丁重建 - 重建蒙版的贴片。该网络是一个编码器 - regressor-decoder架构：编码器将可见的补丁作为输入；回归器预测蒙版贴片的表示，这些表示的表示与从编码器计算的表示形式对齐，使用可见贴片的表示以及可见的和蒙版贴剂的位置；解码器从预测的编码表示形式中重建了蒙版的贴片。 CAE设计鼓励学习编码器（表示）从完成有关任务的分离：掩盖表示的预测和掩盖的补丁重建任务，并在编码的表示空间中进行预测，从经验上显示了对表示学习的好处。我们通过在下游任务中出色的转移性能来证明CAE的有效性：语义分割，对象检测和实例分割以及分类。该代码将在https://github.com/atten4vis/cae上找到。

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches. The network is an encoder-regressor-decoder architecture: the encoder takes the visible patches as input; the regressor predicts the representations of the masked patches, which are expected to be aligned with the representations computed from the encoder, using the representations of visible patches and the positions of visible and masked patches; the decoder reconstructs the masked patches from the predicted encoded representations. The CAE design encourages the separation of learning the encoder (representation) from completing the pertaining tasks: masked representation prediction and masked patch reconstruction tasks, and making predictions in the encoded representation space empirically shows the benefit to representation learning. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks: semantic segmentation, object detection and instance segmentation, and classification. The code will be available at https://github.com/Atten4Vis/CAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题