论文标题
通过知识蒸馏注入空间信息,以增强单声道语音
Injecting Spatial Information for Monaural Speech Enhancement via Knowledge Distillation
论文作者
论文摘要
单声道语音增强(SE)通过利用单个麦克风的记录,为SE任务提供了一种多功能且具有成本效益的方法。但是,由于单元素SE方法无法从单渠道录制中提取空间信息,因此多通道SE背后的单声道SE落后于此,这极大地限制了其应用程序场景。为了解决这个问题,我们将空间信息注入单声道SE模型中,并提出了一种知识蒸馏策略,以使Monaural SE模型能够从双耳SE模型中学习双耳语音特征,这使得Monaural SE模型可以使更高的可理解性和在低信噪比(SNR)条件下重建更高的可理解性和质量增强的语音。广泛的实验表明,通过知识蒸馏注入空间信息,我们提出的单膜SE模型可实现与其他参数更少的单声道SE模型相对的性能。
Monaural speech enhancement (SE) provides a versatile and cost-effective approach to SE tasks by utilizing recordings from a single microphone. However, the monaural SE lags performance behind multi-channel SE as the monaural SE methods are unable to extract spatial information from one-channel recordings, which greatly limits their application scenarios. To address this issue, we inject spatial information into the monaural SE model and propose a knowledge distillation strategy to enable the monaural SE model to learn binaural speech features from the binaural SE model, which makes monaural SE model possible to reconstruct higher intelligibility and quality enhanced speeches under low signal-to-noise ratio (SNR) conditions. Extensive experiments show that our proposed monaural SE model by injecting spatial information via knowledge distillation achieves favorable performance against other monaural SE models with fewer parameters.