仔细观察具有特征，logit和梯度的知识蒸馏

论文标题

仔细观察具有特征，logit和梯度的知识蒸馏

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

论文作者

Hsu, Yen-Chang, Smith, James, Shen, Yilin, Kira, Zsolt, Jin, Hongxia

论文摘要

知识蒸馏（KD）是将学习知识从一个神经网络模型转移到另一种神经网络模型的实质策略。为此策略开发了大量方法。尽管大多数方法设计了一种更有效的方法来促进知识转移，但对比较知识源（例如特征，逻辑和梯度）的效果的关注减少了。这项工作提供了一种新的观点，可以通过使用不同的知识来源近似经典的KL-Divergence标准来激发一系列知识蒸馏策略，从而在模型压缩和增量学习中进行了系统的比较。我们的分析表明，逻辑通常是更有效的知识来源，并表明具有足够的特征维度对于模型设计至关重要，为有效的基于KD的转移学习提供了实用的指南。

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient way to facilitate knowledge transfer, less attention has been put on comparing the effect of knowledge sources such as features, logits, and gradients. This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources, making a systematic comparison possible in model compression and incremental learning. Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design, providing a practical guideline for effective KD-based transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题