ASR校正和语言理解的联合上下文建模

论文标题

ASR校正和语言理解的联合上下文建模

Joint Contextual Modeling for ASR Correction and Language Understanding

论文作者

Weng, Yue, Miryala, Sai Sumanth, Khatri, Chandra, Wang, Runze, Zheng, Huaixiu, Molino, Piero, Namazifar, Mahdi, Papangelis, Alexandros, Williams, Hugh, Bell, Franziska, Tur, Gokhan

论文摘要

自动语音识别（ASR）的质量对对话系统至关重要，因为ASR错误传播并直接影响下游任务，例如语言理解（LU）。在本文中，我们提出了多任务神经方法，以与LU共同对ASR输出进行上下文语言校正，以同时提高这两个任务的性能。为了衡量这种方法的有效性，我们使用了公共基准测试，即第二对话状态跟踪（DSTC2）语料库。作为基准方法，我们培训了特定任务的统计语言模型（SLM）和微调的最先进的广义预训练（GPT）语言模型，以重新列为N-最佳ASR假设，然后是一个模型来识别对话框法案和插槽。 i）我们使用具有歧视性损失的GPT和分层CNN-RNN模型进一步训练了排名模型。我们将这些排名模型扩展到首先选择最佳的ASR输出，然后在端到端的时尚中确定对话法和插槽。 ii）我们还提出了一种新型的关节ASR误差校正和LU模型，这是一个单词混乱的指针网络（WCN-PTR），顶部具有多头自我注意力，它消耗了n-pess填充的一词混乱。我们表明，使用少量使用少量内域数据训练的关节模型，可以将OFF ASR的错误率和随后的LU系统的相对相对降低14％。

The quality of automatic speech recognition (ASR) is critical to Dialogue Systems as ASR errors propagate to and directly impact downstream tasks such as language understanding (LU). In this paper, we propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with LU to improve the performance of both tasks simultaneously. To measure the effectiveness of this approach we used a public benchmark, the 2nd Dialogue State Tracking (DSTC2) corpus. As a baseline approach, we trained task-specific Statistical Language Models (SLM) and fine-tuned state-of-the-art Generalized Pre-training (GPT) Language Model to re-rank the n-best ASR hypotheses, followed by a model to identify the dialog act and slots. i) We further trained ranker models using GPT and Hierarchical CNN-RNN models with discriminatory losses to detect the best output given n-best hypotheses. We extended these ranker models to first select the best ASR output and then identify the dialogue act and slots in an end to end fashion. ii) We also proposed a novel joint ASR error correction and LU model, a word confusion pointer network (WCN-Ptr) with multi-head self-attention on top, which consumes the word confusions populated from the n-best. We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题