具有基于得分扩散的全频段一般音频合成

论文标题

具有基于得分扩散的全频段一般音频合成

Full-band General Audio Synthesis with Score-based Diffusion

论文作者

Pascual, Santiago, Bhattacharya, Gautam, Yeh, Chunghsin, Pons, Jordi, Serrà, Joan

论文摘要

最近的作品显示了深层生成模型可以从单个标签中处理一般音频合成的能力，从而产生各种冲动，音调和环境声音。此类模型在带限的信号上运行，并且由于回归方法的结果，它们通常由预训练的潜在编码器和/或几个级联模块符合。在这项工作中，我们为一般音频综合，名为DAG提出了一个基于扩散的生成模型，该模型涉及波形域中的全带信号。结果表明，就质量和多样性而言，DAG比现有标签条件发电机的优越性。更具体地说，与艺术的状态相比，带限制的DAG和全频段版本可实现相对改进，分别高达40％和65％。我们认为DAG足够灵活，可以容纳不同的调理模式，同时提供高质量的合成。

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds. Such models operate on band-limited signals and, as a result of an autoregressive approach, they are typically conformed by pre-trained latent encoders and/or several cascaded modules. In this work, we propose a diffusion-based generative model for general audio synthesis, named DAG, which deals with full-band signals end-to-end in the waveform domain. Results show the superiority of DAG over existing label-conditioned generators in terms of both quality and diversity. More specifically, when compared to the state of the art, the band-limited and full-band versions of DAG achieve relative improvements that go up to 40 and 65%, respectively. We believe DAG is flexible enough to accommodate different conditioning schemas while providing good quality synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题