多渠道回响分离的备速传输功能不变SDR训练标准

论文标题

多渠道回响分离的备速传输功能不变SDR训练标准

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

论文作者

Boeddeker, Christoph, Zhang, Wangyou, Nakatani, Tomohiro, Kinoshita, Keisuke, Ochiai, Tsubasa, Delcroix, Marc, Kamo, Naoyuki, Qian, Yanmin, Haeb-Umbach, Reinhold

论文摘要

事实证明，时间域训练标准非常有效地分离单渠道非革路语音混合物。同样，基于掩模的梁形成在多渠道回响的语音增强和源分离中也表现出了令人印象深刻的性能。在这里，我们建议将支持的多通道源分离与时间域训练目标函数相结合。为了实现目标，我们建议基于信噪比（CI-SDR）损失使用复杂的传递函数。虽然这是一个众所周知的评估指标（BSS评估），但以前尚未用作培训目标。为了显示有效性，我们证明了基于Librispeech的混音混合物的性能。在此任务上，提出的系统在单源非依次输入（即librispeech test_clean）上获得的错误率仅为1.2个百分点，从而优于基于常规的置换训练系统和诸如规模量表的不变性目标的替代目标信号对差异的替代目标。

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题