论文标题

通过当地先验匹配的半监督语音识别

Semi-Supervised Speech Recognition via Local Prior Matching

论文作者

Hsu, Wei-Ning, Lee, Ann, Synnaeve, Gabriel, Hannun, Awni

论文摘要

对于诸如语音识别之类的序列转导任务,一个强大的结构化先验模型编码有关目标空间的丰富信息,通过分配较低的概率来隐式排除无效序列。在这项工作中,我们提出了本地先验匹配(LPM),这是一个半监督的目标,它从强大的先验(例如语言模型)中提取知识,以向对未标记语音训练的歧视性模型提供学习信号。我们证明,LPM在理论上是良好的动机,易于实现的,并且优于现有的知识蒸馏技术。从经过100个小时的标记语音训练的基线开始,还有360小时的未标记数据,LPM在清洁和嘈杂的测试集中恢复了54%和73%的单词错误率,相对于相同数据的完全监督模型。

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源