论文标题

块流媒体变压器用于口语理解和同时语音翻译

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

论文作者

Deng, Keqi, Watanabe, Shinji, Shi, Jiatong, Arora, Siddhant

论文摘要

尽管变形金刚在多个语音处理任务(例如口语理解(SLU)和语音翻译(ST))中取得了成功,但在保持竞争性绩效的同时实现了在线处理,对于现实世界中的互动仍然至关重要。在本文中,我们使用Blockwise Streaming Transformer迈出了流slu和ST的第一步,该变压器基于上下文块处理和块同步梁搜索。此外,我们设计了自动语音识别(ASR)基于流式SLU任务的基于基于的中间损失正规化,以进一步提高分类性能。至于同时的ST任务,我们提出了一种跨语言编码方法,该方法采用了通过目标语言翻译优化的CTC分支。此外,CTC翻译输出还用于优化CTC前缀分数的搜索空间,首次获得关节CTC/注意力同时翻译。 SLU的实验是在FSC和Slurp Corpora上进行的,而ST任务进行了对Fisher-Callhome西班牙语和必C en-de Corpora的评估。实验结果表明,与离线模型相比,Blockwise流媒体变压器可实现竞争成果,尤其是我们提出的方法,该方法进一步在SLU任务上产生了2.4%的精度增益,而ST任务上的ST任务上的BLLEU增益超过了流媒体基线。

Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction. In this paper, we take the first step on streaming SLU and simultaneous ST using a blockwise streaming Transformer, which is based on contextual block processing and blockwise synchronous beam search. Furthermore, we design an automatic speech recognition (ASR)-based intermediate loss regularization for the streaming SLU task to improve the classification performance further. As for the simultaneous ST task, we propose a cross-lingual encoding method, which employs a CTC branch optimized with target language translations. In addition, the CTC translation output is also used to refine the search space with CTC prefix score, achieving joint CTC/attention simultaneous translation for the first time. Experiments for SLU are conducted on FSC and SLURP corpora, while the ST task is evaluated on Fisher-CallHome Spanish and MuST-C En-De corpora. Experimental results show that the blockwise streaming Transformer achieves competitive results compared to offline models, especially with our proposed methods that further yield a 2.4% accuracy gain on the SLU task and a 4.3 BLEU gain on the ST task over streaming baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源