论文标题

部分可观测时空混沌系统的无模型预测

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

论文作者

Arora, Siddhant, Dalmia, Siddharth, Yan, Brian, Metze, Florian, Black, Alan W, Watanabe, Shinji

论文摘要

端到端的口语理解(SLU)系统由于简单性和避免错误传播的能力而在级联方法上广受欢迎。但是,这些系统模型序列将标记作为序列预测任务,从而导致其与其建立的令牌级别标记公式的差异。我们构建了组成的端到端SLU系统,该系统明确将识别SLU中的口语的附加复杂性与序列标记的NLU任务分开。通过依靠对ASR训练的中间解码器,我们的端到端系统将输入模式从语音转换为可以在传统序列标记框架中使用的代币级表示。我们的端到端SLU系统中ASR和NLU配方的这种组成提供了与预训练的ASR和NLU系统的直接兼容性,允许对单个组件进行性能监控,并可以使用CRF(例如CRF)的全球归一化损失,从而使它们在实际情况下具有吸引力。在跨SLU基准的命名实体识别的标签任务上,我们的模型优于级联和直接端到端模型。

End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation. However, these systems model sequence labeling as a sequence prediction task causing a divergence from its well-established token-level tagging formulation. We build compositional end-to-end SLU systems that explicitly separate the added complexity of recognizing spoken mentions in SLU from the NLU task of sequence labeling. By relying on intermediate decoders trained for ASR, our end-to-end systems transform the input modality from speech to token-level representations that can be used in the traditional sequence labeling framework. This composition of ASR and NLU formulations in our end-to-end SLU system offers direct compatibility with pre-trained ASR and NLU systems, allows performance monitoring of individual components and enables the use of globally normalized losses like CRF, making them attractive in practical scenarios. Our models outperform both cascaded and direct end-to-end models on a labeling task of named entity recognition across SLU benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源