使用知识蒸馏的在线合奏模型压缩

论文标题

使用知识蒸馏的在线合奏模型压缩

Online Ensemble Model Compression using Knowledge Distillation

论文作者

Walawalkar, Devesh, Shen, Zhiqiang, Savvides, Marios

论文摘要

本文介绍了一个新型的基于知识蒸馏的模型压缩框架，该模型由学生合奏组成。它可以使每个压缩的学生模型同时蒸馏出同时学习的集合知识。每个模型由于其独特的体系结构而从数据分布中学习独特的表示。通过结合每个模型的知识，这有助于合奏更好地概括。蒸馏学生和合奏老师同时接受培训，而无需审慎的体重。此外，我们提出的方法可以为多重压缩的学生提供单一培训，这对于不同的情况既有效又灵活。我们使用最先进的分类模型提供全面的实验，以验证我们的框架的有效性。值得注意的是，使用我们的框架，97％的压缩RESNET110学生模型设法在其在CIFAR100数据集中的基线培训中产生了10.64％的相对精度增益。同样，95％压缩的Densenet-BC（K = 12）模型也管理了8.17％的相对精度增益。

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model's knowledge. The distilled students and ensemble teacher are trained simultaneously without requiring any pretrained weights. Moreover, our proposed method can deliver multi-compressed students with single training, which is efficient and flexible for different scenarios. We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness. Notably, using our framework a 97% compressed ResNet110 student model managed to produce a 10.64% relative accuracy gain over its individual baseline training on CIFAR100 dataset. Similarly a 95% compressed DenseNet-BC(k=12) model managed a 8.17% relative accuracy gain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题