加权合奏自我监督学习

论文标题

加权合奏自我监督学习

Weighted Ensemble Self-Supervised Learning

论文作者

Ruan, Yangjun, Singh, Saurabh, Morningstar, Warren, Alemi, Alexander A., Ioffe, Sergey, Fischer, Ian, Dillon, Joshua V.

论文摘要

事实证明，Ensbling是提高模型性能，不确定性估计和鲁棒性的强大技术。自我监督学习（SSL）的进步使我们能够利用大型未标记的Corpora进行最新的少量和监督的学习表现。在本文中，我们探讨了整体方法如何通过开发允许数据依赖数据加权跨层损失的框架来改善最近的SSL技术。我们避免结合代表骨干。此选择产生了一种有效的合奏方法，该方法会产生较小的培训成本，并且不需要建筑变化或计算开销才能进行下游评估。通过两种最先进的SSL方法，Dino（Caron等，2021）和MSN（Assran等，2022）证明了我们方法的有效性。我们的方法在ImageNet-1k的多个评估指标中都优于两者，尤其是在少数弹射设置中。我们探索了几种加权方案，发现那些增加合奏头多样性的人会导致更好的下游评估结果。彻底的实验产生了改善的先前的艺术基线，我们的方法仍然超过这些基线。例如，我们在MSN VIT-B/16中的总体改进是3.9 P.P.用于1次学习。

Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题