使用可区分参数源模型的无监督音乐源分离

论文标题

使用可区分参数源模型的无监督音乐源分离

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

论文作者

Schulze-Forster, Kilian, Richard, Gaël, Kelley, Liam, Doire, Clement S. J., Badeau, Roland

论文摘要

有监督的深度学习方法无法确定的音频源分离实现最新的性能，但需要混合数据集及其相应的隔离源信号。对于音乐混合物而言，这样的数据集可能非常昂贵。这提出了不受监督的方法的需求。我们提出了一种新颖的基于模型的深度学习方法来进行音乐源分离。每个源都是使用可区分的参数源滤波器模型建模的。通过估计源模型的参数，训练了神经网络，以估计源模型的基本频率来重建观察到的混合物作为来源的总和。在测试时，从合成的源信号获得软面膜。对声乐集合分离任务的实验评估表明，所提出的方法优于基于非负矩阵分解和监督深度学习基线的无学习方法。以源模型的形式将域知识整合到数据驱动的方法中会导致高数据效率：即使在不到三分钟的音频进行训练时，提出的方法也可以达到良好的分离质量。这项工作使得基于深度学习的分离在具有地面真相培训数据的情况下可用。

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised model-based deep learning approach to musical source separation. Each source is modelled with a differentiable parametric source-filter model. A neural network is trained to reconstruct the observed mixture as a sum of the sources by estimating the source models' parameters given their fundamental frequencies. At test time, soft masks are obtained from the synthesized source signals. The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods based on nonnegative matrix factorization and a supervised deep learning baseline. Integrating domain knowledge in the form of source models into a data-driven method leads to high data efficiency: the proposed approach achieves good separation quality even when trained on less than three minutes of audio. This work makes powerful deep learning based separation usable in scenarios where training data with ground truth is expensive or nonexistent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题