用稀疏门控的专家混合在语音分离中处理权衡取舍

论文标题

用稀疏门控的专家混合在语音分离中处理权衡取舍

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

论文作者

Wang, Xiaofei, Chen, Zhuo, Shi, Yu, Wu, Jian, Kanda, Naoyuki, Yoshioka, Takuya

论文摘要

采用单声道分离（SS）模型作为自动语音识别（ASR）的前端，涉及平衡两种权衡。首先，虽然较大的模型改善了SS性能，但它也需要更高的计算成本。其次，更优化用于处理重叠语音的SS模型可能会在非叠层言论区域中引入更多的处理工件。在本文中，我们通过稀疏的专家（MOE）体系结构解决了这些权衡。使用模拟和真实会议记录获得的综合评估结果表明，我们提出的稀疏门控MOE SS模型可实现出色的分离能力，而语音失真较少，同时仅涉及边际运行时的成本增加。

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech regions. In this paper, we address these trade-offs with a sparsely-gated mixture-of-experts (MoE) architecture. Comprehensive evaluation results obtained using both simulated and real meeting recordings show that our proposed sparsely-gated MoE SS model achieves superior separation capabilities with less speech distortion, while involving only a marginal run-time cost increase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题