Postgan：基于GAN的后处理器，可提高编码语音的质量

论文标题

Postgan：基于GAN的后处理器，可提高编码语音的质量

PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech

论文作者

Korse, Srikanth, Pia, Nicola, Gupta, Kishan, Fuchs, Guillaume

论文摘要

通过转换编码编码的语音质量会受到各种伪像的影响，尤其是当比特量量化频率成分太低时。为了减轻这些编码工件并提高编码语音的质量，传统上是在解码器方面采用了依赖于编码器的A-Priori信息的后处理器。近年来，已经提出了几个数据驱动的后过程，这些过程被证明超过了传统方法。在本文中，我们提出了Postgan，这是一种基于GAN的神经后处理器，在子频段域中运行，依靠U-NET体系结构和学习的仿射变换。它已经在最近标准化的低复杂性，低延迟蓝牙编解码器（LC3）上进行了测试，用于最低比特率（16 kbit/s）的宽带语音。主观评估和客观分数表明，新引入的后处理器超过了先前发表的方法，并且可以将编码语音的质量提高约20点。

The quality of speech coded by transform coding is affected by various artefacts especially when bitrates to quantize the frequency components become too low. In order to mitigate these coding artefacts and enhance the quality of coded speech, a post-processor that relies on a-priori information transmitted from the encoder is traditionally employed at the decoder side. In recent years, several data-driven post-postprocessors have been proposed which were shown to outperform traditional approaches. In this paper, we propose PostGAN, a GAN-based neural post-processor that operates in the sub-band domain and relies on the U-Net architecture and a learned affine transform. It has been tested on the recently standardized low-complexity, low-delay bluetooth codec (LC3) for wideband speech at the lowest bitrate (16 kbit/s). Subjective evaluations and objective scores show that the newly introduced post-processor surpasses previously published methods and can improve the quality of coded speech by around 20 MUSHRA points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题