进行零拍的知识蒸馏以进行自然语言处理

论文标题

进行零拍的知识蒸馏以进行自然语言处理

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

论文作者

Rashid, Ahmad, Lioutas, Vasileios, Ghaddar, Abbas, Rezagholizadeh, Mehdi

论文摘要

知识蒸馏（KD）是一种常见知识转移算法，用于在各种基于深度学习的自然语言处理（NLP）解决方案中进行模型压缩。在定期表现中，KD需要访问教师的培训数据，以将知识转移到学生网络。但是，隐私问题，数据法规和专有原因可能会阻止访问此类数据。据我们所知，我们介绍了NLP的零照片知识蒸馏的第一项工作，该工作在没有任何特定任务数据的情况下向更大的老师学习。我们的解决方案结合了域数据和对抗性培训，以了解教师的产出分布。我们从胶水基准中调查了六个任务，并证明我们可以在30次压缩模型的同时，达到教师分类得分的75％至92％。

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher's training data for knowledge transfer to the student network. However, privacy concerns, data regulations and proprietary reasons may prevent access to such data. We present, to the best of our knowledge, the first work on Zero-Shot Knowledge Distillation for NLP, where the student learns from the much larger teacher without any task specific data. Our solution combines out of domain data and adversarial training to learn the teacher's output distribution. We investigate six tasks from the GLUE benchmark and demonstrate that we can achieve between 75% and 92% of the teacher's classification score (accuracy or F1) while compressing the model 30 times.

下载PDF全文

下载文献需遵守相关版权规定

论文标题