论文标题

训练稳定剂的调理技巧

Conditioning Trick for Training Stable GANs

论文作者

Esmaeilpour, Mohammad, Sallo, Raymel Alfonso, St-Georges, Olivier, Cardinal, Patrick, Koerich, Alessandro Lameiras

论文摘要

在本文中,我们提出了一个调节技巧,称为差异偏离正常性,该技巧是针对GAN培训期间不稳定性问题应用于发电机网络的。我们强迫发电机更接近Schur分解谱域中计算的真实样品的正态函数。这种绑定使发电机可截断,并且不限制探索所有可能的模式。我们稍微修改了Biggan架构,该架构结合了剩余网络,用于合成音频信号的2D表示,该网络可以通过保留的相位信息来重建高质量的声音。此外,提出的条件训练方案使生成频谱图之间的忠诚度与多样性之间的权衡。根据三个客观指标:Inception分数,Frechet Intection距离,Frechet Inteption距离和信号至noise的比率,对URBANSOUND8K和ESC-50环境声音数据集和Mozilla环境声音数据集和Mozilla Common Voice数据集的实验结果表明。

In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice dataset have shown that the proposed GAN configuration with the conditioning trick remarkably outperforms baseline architectures, according to three objective metrics: inception score, Frechet inception distance, and signal-to-noise ratio.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源