论文标题

Dipair:快速准确的蒸馏数万亿级文本匹配和配对建模

DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

论文作者

Chen, Jiecao, Yang, Liu, Raman, Karthik, Bendersky, Michael, Yeh, Jung-Jung, Zhou, Yun, Najork, Marc, Cai, Danyang, Emadzadeh, Ehsan

论文摘要

诸如Bert(Devlin等,2018)之类的预训练模型主导了NLP / IR应用程序,例如单句话分类,文本对分类和问题回答。但是,由于其高昂的计算成本,将这些模型部署在实际系统中是高度不平凡的。一种常见的补救措施是知识蒸馏(Hinton等,2015),导致推断更快。但是,正如我们在这里所展示的那样,现有作品并未优化用于处理文本的对(或元组)。因此,它们要么不可伸缩,要么表现出低于标准的性能。在这项工作中,我们提出了Dipair,这是一个新颖的框架,用于在文本对任务上快速,准确的模型提炼。再加上端到端的培训策略,Dipair既可以扩展,又提供了改进的质量速度折衷。对学术和现实世界电子商务基准进行的实证研究表明,相对于跨意义的教师BERT模型,该方法的加速度超过350倍,质量最小的速度降低了拟议方法的功效。

Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. However -- as we show here -- existing works are not optimized for dealing with pairs (or tuples) of texts. Consequently, they are either not scalable or demonstrate subpar performance. In this work, we propose DiPair -- a novel framework for distilling fast and accurate models on text pair tasks. Coupled with an end-to-end training strategy, DiPair is both highly scalable and offers improved quality-speed tradeoffs. Empirical studies conducted on both academic and real-world e-commerce benchmarks demonstrate the efficacy of the proposed approach with speedups of over 350x and minimal quality drop relative to the cross-attention teacher BERT model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源