Fithubert：越来越薄，更深入的知识蒸馏言语自我监督学习

论文标题

Fithubert：越来越薄，更深入的知识蒸馏言语自我监督学习

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

论文作者

Lee, Yeonghyeon, Jang, Kangwook, Goo, Jahyun, Jung, Youngmoon, Kim, Hoirin

论文摘要

大规模的语音自我监督学习（SSL）已经出现到语音处理的主要领域，但是，由于其巨大规模而引起的计算成本问题使学术界具有很高的进入障碍。此外，语音SSL模型的现有蒸馏技术通过减少层来压缩模型，从而在语言模式识别任务（例如音素识别（PR））中引起性能降解。在本文中，我们提出了Fithubert，它几乎在几乎所有模型组件中都在尺寸较薄，并且与先前的语音SSL蒸馏作品相比，层层更深。此外，我们采用缩短时间来加快推理时间，并提出一种基于提示的蒸馏方法，以减少性能降解。与休伯特相比，我们的方法将模型降低至23.8％，推理时间为35.9％。此外，我们在优越的基准上达到了12.1％的单词错误率和13.3％的音素错误率，这比先前的工作优越。

Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题