实时语音频率带宽扩展

论文标题

实时语音频率带宽扩展

Real-time Speech Frequency Bandwidth Extension

论文作者

Li, Yunpeng, Tagliasacchi, Marco, Rybakov, Oleg, Ungureanu, Victor, Roblek, Dominik

论文摘要

在本文中，我们提出了一个轻巧的模型，用于语音信号的频率带宽扩展，将采样频率从8kHz增加到16kHz，同时将高频含量恢复到与16kHz地面真相几乎无法区分的水平。该模型体系结构基于Seanet（声音增强网络），波浪与波浪完全卷积模型，该模型结合了特征损失和对抗性损失来重建输入语音的增强版本。此外，我们提出了一种可以在流媒体模式下在设备上部署的Seanet的变体，实现了16ms的架构延迟。当在移动CPU的单个核心上进行介绍时，处理一个16ms框架仅需1.5ms。低潜伏期使其可行对于双向语音通信系统。

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题