复杂的经常性自动编码器，并应用于语音增强

论文标题

复杂的经常性自动编码器，并应用于语音增强

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement

论文作者

Xie, Yuying, Arildsen, Thomas, Tan, Zheng-Hua

论文摘要

作为变异自动编码器（VAE）的扩展，复杂的VAE使用复杂的高斯分布来建模潜在变量和数据。这项工作提出了一个复杂的复发框架，特别是在其中使用复杂值的复发性神经网络和L1重建损失。首先，为了说明语音信号的时间属性，这项工作在复杂的VAE框架中介绍了复杂值的复发性神经网络。此外，在此框架中，L1损失还用作重建损失。为了举例说明在语音处理中使用复杂的生成模型，我们选择语音增强作为本文的特定应用。实验基于TIMIT数据集。结果表明，所提出的方法对语音清晰度和信号质量的客观指标进行了改进。

As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent neural network in the complex VAE framework. Besides, L1 loss is used as the reconstruction loss in this framework. To exemplify the use of the complex generative model in speech processing, we choose speech enhancement as the specific application in this paper. Experiments are based on the TIMIT dataset. The results show that the proposed method offers improvements on objective metrics in speech intelligibility and signal quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题