教师指导培训：知识转移的有效框架

论文标题

教师指导培训：知识转移的有效框架

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

论文作者

Zaheer, Manzil, Rawat, Ankit Singh, Kim, Seungyeon, You, Chong, Jain, Himanshu, Veit, Andreas, Fergus, Rob, Kumar, Sanjiv

论文摘要

大型预验证模型（例如GPT-3）在训练过程中暴露于大量数据所取决的表现。类似地，将如此大型模型提炼成紧凑的模型以进行有效的部署，也需要大量（标记或未标记的）培训数据。在本文中，我们提出了培训高质量紧凑模型的教师指导培训（TGT）框架，该模型利用了预算的生成模型获得的知识，同时避免了大量数据的需求。 TGT利用了以下事实：教师已经获得了基础数据域的良好表示，该数据通常比输入空间要较低。此外，我们可以使用老师通过采样或基于梯度的方法来更有效地探索输入空间。因此，使TGT对于有限的数据或长尾设置特别有吸引力。我们正式在我们的概括范围内正式捕获了提出的数据域探索的好处。我们发现，TGT可以提高几个图像分类基准以及一系列文本分类和检索任务的准确性。

The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain, which typically corresponds to a much lower dimensional manifold than the input space. Furthermore, we can use the teacher to explore input space more efficiently through sampling or gradient-based methods; thus, making TGT especially attractive for limited data or long-tail settings. We formally capture this benefit of proposed data-domain exploration in our generalization bounds. We find that TGT can improve accuracy on several image classification benchmarks as well as a range of text classification and retrieval tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题