论文标题
知识蒸馏符合开放式的半监督学习
Knowledge Distillation Meets Open-Set Semi-Supervised Learning
论文作者
论文摘要
现有的知识蒸馏方法主要集中于蒸馏教师的预测和中间激活。但是,结构化表示,可以说是深层模型的最关键成分之一,在很大程度上被忽略了。在这项工作中,我们提出了一种小说{\ em \ modelName {}}({\ bf \ em \ shortname {})}}用于将代表性知识从审慎的教师提炼到目标学生的方法。关键的想法是,我们利用教师的分类器作为语义评论家来评估教师和学生的表示,并用高阶结构化信息在所有功能维度上使用高级结构化信息来提炼语义知识。这是通过引入通过将学生的代表来计算为教师分类器的跨网络logit的概念来完成的。此外,将看到的类作为组合视角的语义空间的基础,我们将\ shortname {}缩放到看不见的类,以有效利用很大程度上可用的,任意的未标记的培训数据。在问题层面上,这与开放式半监督学习(SSL)之间建立了有趣的联系。广泛的实验表明,我们的\短名称{}在粗糙对象分类和精细的面部识别任务上都显着胜过先前的最先进的知识蒸馏方法,并且研究较少但实际上至关重要的二进制网络蒸馏。在我们引入的更现实的开放式SSL设置下,我们透露,知识蒸馏通常比现有的分布(OOD)样本检测更有效,并且我们提出的\ ShortName {}优于先前的蒸馏和SSL竞争者。源代码可在\ url {https://github.com/jingyang2017/srd \ _ossl}中获得。
Existing knowledge distillation methods mostly focus on distillation of teacher's prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel {\em \modelname{}} ({\bf\em \shortname{})} method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student's representation into teacher's classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale \shortname{} to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our \shortname{} outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing Out-Of-Distribution (OOD) sample detection, and our proposed \shortname{} is superior over both previous distillation and SSL competitors. The source code is available at \url{https://github.com/jingyang2017/SRD\_ossl}.