具有声学误差采样的跨界语言模型

论文标题

具有声学误差采样的跨界语言模型

Cross-Utterance Language Models with Acoustic Error Sampling

论文作者

Sun, G., Zhang, C., Woodland, P. C.

论文摘要

在语言模型（LMS）中，有效利用更丰富的上下文信息是自动语音识别（ASR）的长期研究问题。本文提出了跨牙作用LM（CULM），该论文通过使用提取网络从过去和将来的话语中得出的上下文向量增强了标准长期记忆（LSTM）LM的输入。提取网络使用另一个LSTM将周围的话语编码为向量，该向量使用LSTM最终隐藏状态的投影或多头自我牵引力层集成到上下文向量中。此外，提出了一种声学误差抽样技术，以减少训练和测试时间之间的不匹配。这是通过考虑可能在模型训练过程中的ASR错误来实现的，因此可以提高单词错误率（WER）。在AMI和总机数据集上执行的实验表明，CULMS的表现优于LSTM LM基线WER。特别是，具有基于自动层的萃取网络和声学误差采样的CULM可在AMI上降低0.6％的绝对降低，在各个基线上，估算板零件的降低0.3％，降低了0.9％。

The effective exploitation of richer contextual information in language models (LMs) is a long-standing research problem for automatic speech recognition (ASR). A cross-utterance LM (CULM) is proposed in this paper, which augments the input to a standard long short-term memory (LSTM) LM with a context vector derived from past and future utterances using an extraction network. The extraction network uses another LSTM to encode surrounding utterances into vectors which are integrated into a context vector using either a projection of LSTM final hidden states, or a multi-head self-attentive layer. In addition, an acoustic error sampling technique is proposed to reduce the mismatch between training and test-time. This is achieved by considering possible ASR errors into the model training procedure, and can therefore improve the word error rate (WER). Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER. In particular, the CULM with a self-attentive layer-based extraction network and acoustic error sampling achieves 0.6% absolute WER reduction on AMI, 0.3% WER reduction on the Switchboard part and 0.9% WER reduction on the Callhome part of Eval2000 test set over the respective baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题