Bebert：高效且强大的二进制合奏Bert

论文标题

Bebert：高效且强大的二进制合奏Bert

BEBERT: Efficient and Robust Binary Ensemble BERT

论文作者

Tian, Jiayi, Fang, Chao, Wang, Haonan, Wang, Zhongfeng

论文摘要

预先训练的BERT模型已在自然语言处理（NLP）任务上达到了令人印象深刻的准确性。但是，它们的过量参数阻碍了它们无法在边缘设备上的有效部署。 BERT模型的二元化可以大大减轻此问题，但与全精确的同行相比，精度的准确性下降。在本文中，我们提出了一个有效且坚固的二进制集合BERT（BEBERT）来弥合准确性差距。据我们所知，这是在二进制伯特（Binary Berts）上采用集成技术的第一项工作，产生了Bebert，在保留计算效率的同时，它实现了卓越的准确性。此外，我们在合奏过程中删除了知识蒸馏程序，以加快训练过程而不会损害准确性。胶水基准的实验结果表明，拟议的Bebert在训练时间的2倍加速下，以准确性和鲁棒性明显优于现有的二进制BERT模型。此外，与完整精确的基线相比，我们的贝伯特（Bebert）的精度损失仅为0.3％，而分别为拖失板和型号的大小节省了15倍和13倍。此外，Bebert在准确性上还表现出其他压缩的伯特，高达6.7％。

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly outperforms the existing binary BERT models in accuracy and robustness with a 2x speedup on training time. Moreover, our BEBERT has only a negligible accuracy loss of 0.3% compared to the full-precision baseline while saving 15x and 13x in FLOPs and model size, respectively. In addition, BEBERT also outperforms other compressed BERTs in accuracy by up to 6.7%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题