通过当地先验匹配的半监督语音识别

论文标题

通过当地先验匹配的半监督语音识别

Semi-Supervised Speech Recognition via Local Prior Matching

论文作者

Hsu, Wei-Ning, Lee, Ann, Synnaeve, Gabriel, Hannun, Awni

论文摘要

对于诸如语音识别之类的序列转导任务，一个强大的结构化先验模型编码有关目标空间的丰富信息，通过分配较低的概率来隐式排除无效序列。在这项工作中，我们提出了本地先验匹配（LPM），这是一个半监督的目标，它从强大的先验（例如语言模型）中提取知识，以向对未标记语音训练的歧视性模型提供学习信号。我们证明，LPM在理论上是良好的动机，易于实现的，并且优于现有的知识蒸馏技术。从经过100个小时的标记语音训练的基线开始，还有360小时的未标记数据，LPM在清洁和嘈杂的测试集中恢复了54％和73％的单词错误率，相对于相同数据的完全监督模型。

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题