基于注意力的知识蒸馏多发性任务：DCT驱动损失的影响

论文标题

基于注意力的知识蒸馏多发性任务：DCT驱动损失的影响

Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss

论文作者

López-Cifuentes, Alejandro, Escudero-Viñolo, Marcos, Bescós, Jesús, SanMiguel, Juan C.

论文摘要

知识蒸馏（KD）是定义一组可转移性舷梯的策略，以提高卷积神经网络的效率。基于特征的知识蒸馏是KD的一个子字段，它依赖于中间网络表示，即通过最大激活图（作为源知识）通过最大激活图降低或深度。在本文中，我们提出和分析激活图的2D频率变换在转移它们之前。我们通过使用全局图像提示而不是像素估计来构成\ TextEmdash，该策略增强了诸如场景识别之类的任务的知识可传递性，该任务是由多个概念和各种概念之间的强烈空间和上下文关系定义的。为了验证所提出的方法，提出了对场景识别中最新的评估。实验结果提供了有力的证据，即提出的策略使学生网络能够更好地专注于教师网络学到的相关图像领域，因此与其他所有最新的替代方案相比，导致更好的描述性特征和更高的转移性能。我们在本文http://www-vpu.eps.uam.es/publications/dctbasedkdforscenerecognition上公开发布了本文使用的培训和评估框架。

Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge. In this paper, we propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. We pose that\textemdash by using global image cues rather than pixel estimates, this strategy enhances knowledge transferability in tasks such as scene recognition, defined by strong spatial and contextual relationships between multiple and varied concepts. To validate the proposed method, an extensive evaluation of the state-of-the-art in scene recognition is presented. Experimental results provide strong evidences that the proposed strategy enables the student network to better focus on the relevant image areas learnt by the teacher network, hence leading to better descriptive features and higher transferred performance than every other state-of-the-art alternative. We publicly release the training and evaluation framework used along this paper at http://www-vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题