论文标题
WPD ++:改进的神经波束形式,用于同时言语分离和消失
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation
论文作者
论文摘要
本文旨在消除吵闹的扬声器的言语,加性噪音和回响,从嘈杂的多言语语音混合物中,使自动语音识别(ASR)后端受益。尽管最近提出的加权功率最小化无失真响应(WPD)波束形式可以同时执行分离并进行覆盖,但噪声取消组件仍然具有进展的潜力。我们提出了一个改进的神经WPD波束器,称为“ WPD ++”,通过传统WPD中的增强的波束成形模块,并在关节训练中使用多目标损耗函数。通过利用时空相关性,可以改善波束形成模块。正确设计的多目标损失,包括复杂的光谱域尺度不变的信噪比(C-SI-SNR)和幅度域均方根误差(MAG-MSE),经过适当设计,可以对增强的语音和干净清洁信号的所需力量产生多个约束。进行联合训练是为了优化复杂值的掩码估计器和WPD ++波束形式以端到端方式进行优化。结果表明,所提出的WPD ++在ASR的语音质量和单词错误率(WER)方面优于几个最先进的光束器。
This paper aims at eliminating the interfering speakers' speech, additive noise, and reverberation from the noisy multi-talker speech mixture that benefits automatic speech recognition (ASR) backend. While the recently proposed Weighted Power minimization Distortionless response (WPD) beamformer can perform separation and dereverberation simultaneously, the noise cancellation component still has the potential to progress. We propose an improved neural WPD beamformer called "WPD++" by an enhanced beamforming module in the conventional WPD and a multi-objective loss function for the joint training. The beamforming module is improved by utilizing the spatio-temporal correlation. A multi-objective loss, including the complex spectra domain scale-invariant signal-to-noise ratio (C-Si-SNR) and the magnitude domain mean square error (Mag-MSE), is properly designed to make multiple constraints on the enhanced speech and the desired power of the dry clean signal. Joint training is conducted to optimize the complex-valued mask estimator and the WPD++ beamformer in an end-to-end way. The results show that the proposed WPD++ outperforms several state-of-the-art beamformers on the enhanced speech quality and word error rate (WER) of ASR.