Dipair：快速准确的蒸馏数万亿级文本匹配和配对建模

论文标题

Dipair：快速准确的蒸馏数万亿级文本匹配和配对建模

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

论文作者

Chen, Jiecao, Yang, Liu, Raman, Karthik, Bendersky, Michael, Yeh, Jung-Jung, Zhou, Yun, Najork, Marc, Cai, Danyang, Emadzadeh, Ehsan

论文摘要

诸如Bert（Devlin等，2018）之类的预训练模型主导了NLP / IR应用程序，例如单句话分类，文本对分类和问题回答。但是，由于其高昂的计算成本，将这些模型部署在实际系统中是高度不平凡的。一种常见的补救措施是知识蒸馏（Hinton等，2015），导致推断更快。但是，正如我们在这里所展示的那样，现有作品并未优化用于处理文本的对（或元组）。因此，它们要么不可伸缩，要么表现出低于标准的性能。在这项工作中，我们提出了Dipair，这是一个新颖的框架，用于在文本对任务上快速，准确的模型提炼。再加上端到端的培训策略，Dipair既可以扩展，又提供了改进的质量速度折衷。对学术和现实世界电子商务基准进行的实证研究表明，相对于跨意义的教师BERT模型，该方法的加速度超过350倍，质量最小的速度降低了拟议方法的功效。

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. However -- as we show here -- existing works are not optimized for dealing with pairs (or tuples) of texts. Consequently, they are either not scalable or demonstrate subpar performance. In this work, we propose DiPair -- a novel framework for distilling fast and accurate models on text pair tasks. Coupled with an end-to-end training strategy, DiPair is both highly scalable and offers improved quality-speed tradeoffs. Empirical studies conducted on both academic and real-world e-commerce benchmarks demonstrate the efficacy of the proposed approach with speedups of over 350x and minimal quality drop relative to the cross-attention teacher BERT model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题