使用文本数据进行多任务培训，以端到端语音识别

论文标题

使用文本数据进行多任务培训，以端到端语音识别

Multitask Training with Text Data for End-to-End Speech Recognition

论文作者

Wang, Peidong, Sainath, Tara N., Weiss, Ron J.

论文摘要

我们为基于注意力的端到端语音识别模型提出了一种多任务培训方法。我们通过在音频文本和纯文本数据上训练，通过多任务训练，在聆听，参加和拼写模型中对解码器进行正规化。在不需要额外的语言模型的情况下，经过100小时子集的培训，在不需要其他语言模型的情况下进行了培训，可在基线上提高11％的相对性能改善，并在测试清洁评估集中接近语言模型浅融合的性能。我们观察到整个960小时的LibrisPech培训集中的类似趋势。对不同类型的错误和样本输出句子的分析表明，所提出的方法可以包含语言级别的信息，这表明其在现实世界应用中的有效性。

We propose a multitask training method for attention-based end-to-end speech recognition models. We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data. Trained on the 100-hour subset of LibriSpeech, the proposed method, without requiring an additional language model, leads to an 11% relative performance improvement over the baseline and approaches the performance of language model shallow fusion on the test-clean evaluation set. We observe a similar trend on the whole 960-hour LibriSpeech training set. Analyses of different types of errors and sample output sentences demonstrate that the proposed method can incorporate language level information, suggesting its effectiveness in real-world applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题