高质量的音频编码

论文标题

高质量的音频编码

High Quality Audio Coding with MDCTNet

论文作者

Davidson, Grant, Vinton, Mark, Ekstrand, Per, Zhou, Cong, Villemoes, Lars, Lu, Lie

论文摘要

我们提出了一个神经音频生成模型MDCTNET，该模型在自适应修饰的离散余弦变换（MDCT）的感知加权域中运行。模型的架构捕获了与复发层（RNN）的时间和频率方向的相关性。通过在48 kHz采样的一组多种成型的成型单声音音频信号上训练MDCTNET，通过训练MDCTNET获得了音频编码系统，并以感知音频编码为条件。在一个主观的听力测试中，选择了十个摘录以在内容类型上进行平衡，但对于两种编解码器而言，对24 kb/s可变比特率（VBR）的平均性能（VBR）的平均性能与Bitrate的两倍相似。

We propose a neural audio generative model, MDCTNet, operating in the perceptually weighted domain of an adaptive modified discrete cosine transform (MDCT). The architecture of the model captures correlations in both time and frequency directions with recurrent layers (RNNs). An audio coding system is obtained by training MDCTNet on a diverse set of fullband monophonic audio signals at 48 kHz sampling, conditioned by a perceptual audio encoder. In a subjective listening test with ten excerpts chosen to be balanced across content types, yet stressful for both codecs, the mean performance of the proposed system for 24 kb/s variable bitrate (VBR) is similar to that of Opus at twice the bitrate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题