论文标题

提高LS-GAN在音频和语音信号的稳定性

Improving Stability of LS-GANs for Audio and Speech Signals

论文作者

Esmaeilpour, Mohammad, Sallo, Raymel Alfonso, St-Georges, Olivier, Cardinal, Patrick, Koerich, Alessandro Lameiras

论文摘要

在本文中,我们通过在Schur分解空间中为音频和语音信号的2D表示,解决了生成对抗网络(GAN)的不稳定问题。我们表明,编码从该矢量空间中计算出的正态性转向发电机优化公式有助于制作更全面的光谱图。与基线gan相比,我们证明了结合该指标以增强训练稳定性的有效性。关于urbansound8k和mozilla Common语音数据集的子集的实验结果显示,通过FréchetInception Inception距离测量的生成样品的质量有了显着改善。此外,与常规LS-GAN相比,来自这些样品的重建信号已达到更高的信号与噪声比。

In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice datasets have shown considerable improvements on the quality of the generated samples measured by the Fréchet inception distance. Moreover, reconstructed signals from these samples, have achieved higher signal to noise ratio compared to regular LS-GANs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源