论文标题

用稀疏门控的专家混合在语音分离中处理权衡取舍

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

论文作者

Wang, Xiaofei, Chen, Zhuo, Shi, Yu, Wu, Jian, Kanda, Naoyuki, Yoshioka, Takuya

论文摘要

采用单声道分离(SS)模型作为自动语音识别(ASR)的前端,涉及平衡两种权衡。首先,虽然较大的模型改善了SS性能,但它也需要更高的计算成本。其次,更优化用于处理重叠语音的SS模型可能会在非叠层言论区域中引入更多的处理工件。在本文中,我们通过稀疏的专家(MOE)体系结构解决了这些权衡。使用模拟和真实会议记录获得的综合评估结果表明,我们提出的稀疏门控MOE SS模型可实现出色的分离能力,而语音失真较少,同时仅涉及边际运行时的成本增加。

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech regions. In this paper, we address these trade-offs with a sparsely-gated mixture-of-experts (MoE) architecture. Comprehensive evaluation results obtained using both simulated and real meeting recordings show that our proposed sparsely-gated MoE SS model achieves superior separation capabilities with less speech distortion, while involving only a marginal run-time cost increase.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源