远场自动语音识别

论文标题

远场自动语音识别

Far-Field Automatic Speech Recognition

论文作者

Haeb-Umbach, Reinhold, Heymann, Jahn, Drude, Lukas, Watanabe, Shinji, Delcroix, Marc, Nakatani, Tomohiro

论文摘要

机器对距离麦克风距离的语音的识别（称为远场自动语音识别（ASR））在科学和工业中的注意力大大增加，这是由于识别准确性的同样显着提高而引起的。同时，它已经进入了消费市场，数字家庭助理的口语接口是其最突出的应用程序。距离记录的语音受到各种声学扭曲的影响，因此，与ASR相比，与ASR相比，出现了完全不同的处理管道。使用信号增强的前端，用于进行覆盖，源分离和声学束缚来清理语音，并且通过多条件训练和适应性，后端ASR发动机可鲁棒。我们还将描述所谓的ASR端到端方法，这是一种新的有前途的建筑，最近已扩展到远场景。本教程文章介绍了用于从远处实现准确语音识别的算法，可以看出，尽管深度学习在技术突破中占有很大份额，但与传统信号处理的巧妙组合可以导致令人惊讶的有效解决方案。

The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic speech recognition (ASR), has received a significant increase of attention in science and industry, which caused or was caused by an equally significant improvement in recognition accuracy. Meanwhile it has entered the consumer market with digital home assistants with a spoken language interface being its most prominent application. Speech recorded at a distance is affected by various acoustic distortions and, consequently, quite different processing pipelines have emerged compared to ASR for close-talk speech. A signal enhancement front-end for dereverberation, source separation and acoustic beamforming is employed to clean up the speech, and the back-end ASR engine is robustified by multi-condition training and adaptation. We will also describe the so-called end-to-end approach to ASR, which is a new promising architecture that has recently been extended to the far-field scenario. This tutorial article gives an account of the algorithms used to enable accurate speech recognition from a distance, and it will be seen that, although deep learning has a significant share in the technological breakthroughs, a clever combination with traditional signal processing can lead to surprisingly effective solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题