论文标题
MCNET:融合多通道语音增强的多个提示
McNet: Fuse Multiple Cues for Multichannel Speech Enhancement
论文作者
论文摘要
在多通道语音增强中,光谱和空间信息对于区分语音和噪声至关重要。如何充分利用这两种类型的信息及其时间动态仍然是一个有趣的研究问题。作为解决此问题的解决方案,本文提出了一个名为MCNET的多提示融合网络,该网络分别汇总了四个模块,以分别利用全频段空间,狭窄的空间空间,次波光谱和全频段光谱信息。实验表明,所提出的网络中的每个模块都具有其独特的贡献,并且总体上尤其优于其他最新方法。
In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.