metricgan +/-：在看不见的数据上降低降噪的稳健性增加

论文标题

metricgan +/-：在看不见的数据上降低降噪的稳健性增加

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

论文作者

Close, George, Hain, Thomas, Goetze, Stefan

论文摘要

培训语音增强系统通常不会纳入人类感知的知识，因此可能导致不自然的声音结果。通过预测网络将精神上动机的语音感知指标纳入模型培训的一部分，最近引起了人们的兴趣。但是，此类预测因子的性能受到培训数据中出现的度量分数的分布的限制。在这项工作中，我们提出了Metricgan +/-（Metricgan+的扩展，一个这样的度量动机系统），该系统引入了一个额外的网络 - 一种“脱发器”，该网络试图通过确保在训练中观察到更广泛的训练范围来改善预测网络的稳健性（并扩展生成器）。 VoiceBank数据集的实验结果显示，PESQ得分的相对改善为3.8％（3.05 vs 3.22 PESQ得分），以及更好的概括，以表达看不见的噪音和语音。

Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better generalisation to unseen noise and speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题