论文标题
Radioss:基于mmwave的Audioradio语音增强和分离系统
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System
论文作者
论文摘要
语音增强和分离一直是一个长期存在的问题,尤其是在使用单个麦克风的最新进展中。尽管麦克风在受限的设置中表现良好,但在嘈杂条件下,其语音分离的性能却降低。在这项工作中,我们提出了Radioses,这是一个克服了只有音频系统中固有问题的Audioradio语音增强和分离系统。通过融合互补的无线电模式,Radioses可以估算说话者的数量,解决源头关联问题,分开和增强嘈杂的混合演讲,并提高可理解性和感知质量。我们执行毫米波传感来检测和定位扬声器,并引入Audioradio深度学习框架,以将单独的无线电功能与混合音频功能融合在一起。使用商业现成设备进行的广泛实验表明,Radioses的表现优于各种最先进的基线,在不同的环境环境中具有一致的性能增长。与视听方法相比,Radioses提供了类似的改进(例如,SISDR中的〜3 dB增益),以及较低的计算复杂性的好处,而与隐私有关。
Speech enhancement and separation have been a long-standing problem, especially with the recent advances using a single microphone. Although microphones perform well in constrained settings, their performance for speech separation decreases in noisy conditions. In this work, we propose RadioSES, an audioradio speech enhancement and separation system that overcomes inherent problems in audio-only systems. By fusing a complementary radio modality, RadioSES can estimate the number of speakers, solve source association problem, separate and enhance noisy mixture speeches, and improve both intelligibility and perceptual quality. We perform millimeter-wave sensing to detect and localize speakers, and introduce an audioradio deep learning framework to fuse the separate radio features with the mixed audio features. Extensive experiments using commercial off-the-shelf devices show that RadioSES outperforms a variety of state-of-the-art baselines, with consistent performance gains in different environmental settings. Compared with the audiovisual methods, RadioSES provides similar improvements (e.g., ~3 dB gains in SiSDR), along with the benefits of lower computational complexity and being less privacy concerning.