言语增强的冷扩散

论文标题

言语增强的冷扩散

Cold Diffusion for Speech Enhancement

论文作者

Yen, Hao, Germain, François G., Wichern, Gordon, Roux, Jonathan Le

论文摘要

扩散模型最近显示了难以增强任务的有希望的结果，例如自然图像和音频信号的条件和无条件恢复。在这项工作中，我们探讨了利用最近提出的高级迭代扩散模型的可能性，即冷扩散，以从嘈杂的信号中恢复干净的语音信号。来自冷扩散的采样过程的独特数学特性可用于从任意降解中恢复高质量的样本。基于这些属性，我们提出了一种改进的培训算法和目标，以帮助模型在采样过程中更好地推广。我们通过研究两个模型架构来验证我们提出的框架。与代表性的判别模型和基于扩散的增强模型相比，基准语音增强数据集的实验结果表明了该方法的强劲性能。

Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a recently proposed advanced iterative diffusion model, namely cold diffusion, to recover clean speech signals from noisy signals. The unique mathematical properties of the sampling process from cold diffusion could be utilized to restore high-quality samples from arbitrary degradations. Based on these properties, we propose an improved training algorithm and objective to help the model generalize better during the sampling process. We verify our proposed framework by investigating two model architectures. Experimental results on benchmark speech enhancement dataset VoiceBank-DEMAND demonstrate the strong performance of the proposed approach compared to representative discriminative models and diffusion-based enhancement models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题