通过跨模式教师学习的端到端口语理解的验证语义语音嵌入

论文标题

通过跨模式教师学习的端到端口语理解的验证语义语音嵌入

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

论文作者

Denisov, Pavel, Vu, Ngoc Thang

论文摘要

口语理解通常基于管道架构，包括语音识别和自然语言理解步骤。这些组件是独立优化的，以允许使用可用的数据，但是整个系统遭受了错误传播。在本文中，我们提出了一种新颖的训练方法，该方法可以验证的上下文嵌入能够处理声学特征。特别是，我们将其扩展使用审核的语音识别系统的编码器，以构建端到端的口头语言理解系统。我们提出的方法基于跨语音和文本模式的教师学生框架，这些框架与声学和语义潜在空间保持一致。三个基准测试的实验结果表明，我们的系统达到的性能与管道架构相当，而无需使用任何训练数据，并且在三个基准中的两个基准中的每个班级中有十个示例进行微调后的表现。

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. These components are optimized independently to allow usage of available data, but the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmarks show that our system reaches the performance comparable to the pipeline architecture without using any training data and outperforms it after fine-tuning with ten examples per class on two out of three benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题