用于多通道远处语音识别的四元神经网络

论文标题

用于多通道远处语音识别的四元神经网络

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

论文作者

Qiu, Xinchi, Parcollet, Titouan, Ravanelli, Mirco, Lane, Nicholas, Morchid, Mohamed

论文摘要

尽管自动语音识别（ASR）取得了重大进展，但由于噪音和混响，遥远的ASR仍然具有挑战性。缓解此问题的一种常见方法是将录制设备从不同角度捕获声学场景的多个麦克风。这些多通道的音频记录包含每个信号之间的特定内部关系。在本文中，我们建议使用四季神神经网络捕获这些间和内结构依赖性，它们可以共同处理多个信号作为整个季节实体。四元组代数用汉密尔顿替代标准点产品，从而为元素之间的依赖性提供了一种简单而优雅的方式。然后将四个层与复发性神经网络耦合，该网络可以学习时间域中的长期依赖性。我们表明，在串联的多渠道语音信号上训练的四个季度长期记忆神经网络（QLSTM），在两种不同的多通道远距离语音识别的任务上都优于相当于的实价LSTM。

Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a quaternion long-short term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-channel distant speech recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题