论文标题
通过嵌入图形对齐方式从自我监督教师那里提取知识
Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment
论文作者
论文摘要
最近的进步表明,自我监管的预训练的优势在改善了下游任务的代表性学习方面。现有作品通常通过对下游任务进行微调来利用自我监管的预训练模型。但是,当需要构建自定义的模型体系结构与自我监管模型不同时,微型调整并不能概括为情况。在这项工作中,我们制定了一个新的知识蒸馏框架,通过一种名为“嵌入图形对齐”的新方法将知识从自我监管的预训练的预培训模型转移到任何其他学生网络。具体而言,受到自我监督学习中实例歧视的精神的启发,我们通过在功能嵌入空间中的图形表述对实例 - 实体关系进行建模,并通过对齐教师图和学生图来将自我监督的教师知识提炼为学生网络。我们可以灵活地应用我们的蒸馏计划来转移自我监督的知识,以增强各种学生网络上的代表性学习。我们证明,我们的模型在包括CIFAR100,STL10和Tinyimagenet在内的三个基准数据集上优于多种代表性知识蒸馏方法。代码在这里:https://github.com/yccm/ega。
Recent advances have indicated the strengths of self-supervised pre-training for improving representation learning on downstream tasks. Existing works often utilize self-supervised pre-trained models by fine-tuning on downstream tasks. However, fine-tuning does not generalize to the case when one needs to build a customized model architecture different from the self-supervised model. In this work, we formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network by a novel approach named Embedding Graph Alignment. Specifically, inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space and distill the self-supervised teacher knowledge to a student network by aligning the teacher graph and the student graph. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks. We demonstrate that our model outperforms multiple representative knowledge distillation methods on three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is here: https://github.com/yccm/EGA.