上下文感知的rnnlm撤销会话语音识别

论文标题

上下文感知的rnnlm撤销会话语音识别

Context-aware RNNLM Rescoring for Conversational Speech Recognition

论文作者

Wei, Kun, Guo, Pengcheng, Lv, Hang, Tu, Zhen, Xie, Lei

论文摘要

会话言语识别被认为是一项艰巨的任务，因为其自由式演讲和长期背景依赖性。先前的工作已经通过RNNLM重新抛弃了远程上下文的建模，并提高了性能。为了在对话中进一步利用持久性的性质，例如主题或演讲者转向，我们将回归程序扩展到了新的背景感知方式。对于RNNLM培训，我们通过将相邻句子与各种标签单词（例如说话者或意图信息）串联来捕获上下文依赖性。为了进行格子恢复，相邻句子的格子也与标签单词的首次解码结果相关。此外，我们还采用了基于TF-IDF的选择性串联策略，从而充分利用上下文相似性来提高转录性能。在四个不同的对话测试集中的结果表明，与第一通道解码和常见的晶格响应率相比，我们的方法分别降低了13.1％和6％的相对char-Error速率（CER）。

Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating adjacent sentences with various tag words, such as speaker or intention information. For lattice rescoring, the lattice of adjacent sentences are also connected with the first-pass decoded result by tag words. Besides, we also adopt a selective concatenation strategy based on tf-idf, making the best use of contextual similarity to improve transcription performance. Results on four different conversation test sets show that our approach yields up to 13.1% and 6% relative char-error-rate (CER) reduction compared with 1st-pass decoding and common lattice-rescoring, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题