基于Cycygan的未配对语音编织

论文标题

基于Cycygan的未配对语音编织

CycleGAN-Based Unpaired Speech Dereverberation

论文作者

Muckenhirn, Hannah, Safin, Aleksandr, Erdogan, Hakan, Quitry, Felix de Chaumont, Tagliasacchi, Marco, Wisdom, Scott, Hershey, John R.

论文摘要

通常，基于神经网络的语音用途模型经过配对数据的培训，该数据由干燥的话语及其相应的混响话语组成。这种方法的主要局限性是，当数据合成时，只能在大量数据和各种房间脉冲响应上进行培训，因为获取真实的配对数据的成本很高，因此可以进行合成的回响。在本文中，我们提出了一种基于自行车的方法，该方法可以在未配对的数据上培训过覆盖模型。我们通过将提出的未配对模型与配对模型与相同体系结构进行比较并在同一数据集的配对版本上进行训练，从而量化了使用不成对数据的影响。我们表明，根据客观评估指标，未配对模型的性能与两个不同数据集上的配对模型的性能相当。此外，我们进行了两项主观评估，并表明这两个模型都在AMI数据集上具有可比的主观质量，这在训练过程中未见。

Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained on large amounts of data and a variety of room impulse responses when the data is synthetically reverberated, since acquiring real paired data is costly. In this paper we propose a CycleGAN-based approach that enables dereverberation models to be trained on unpaired data. We quantify the impact of using unpaired data by comparing the proposed unpaired model to a paired model with the same architecture and trained on the paired version of the same dataset. We show that the performance of the unpaired model is comparable to the performance of the paired model on two different datasets, according to objective evaluation metrics. Furthermore, we run two subjective evaluations and show that both models achieve comparable subjective quality on the AMI dataset, which was not seen during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题