论文标题

优化肩膀至肩膀:实时全乐队演讲增强的协调子融合模型

Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

论文作者

Yu, Guochen, Li, Andong, Liu, Wenzhe, Zheng, Chengshi, Wang, Yutian, Wang, Hui

论文摘要

由于建模更多频带的计算复杂性很高,因此基于深神经网络进行实时全乐队语音增强仍然是棘手的。最近的研究通常利用具有相对较低频率分辨率的感知动机的压缩功能来通过一阶段网络过滤全频段频谱,从而导致语音质量有限。在本文中,我们提出了一个协调的次频融合网络,以增强体频语音,该网络旨在以逐步的方式恢复低 - (0-8 kHz),中间(8-16 kHz)和高频(16-24 kHz)。具体而言,首先要仔细考虑双流网络以恢复低频复合频谱,另外两个子网络被设计为仅大小域中的中和高频抑制器。为了充分利用信息间交流,我们采用了一个子带相互作用模块来提供不同频段的外部知识指导。广泛的实验表明,所提出的方法比最先进的全带基线具有一致的性能优势。

Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源