论文标题

实时语音频率带宽扩展

Real-time Speech Frequency Bandwidth Extension

论文作者

Li, Yunpeng, Tagliasacchi, Marco, Rybakov, Oleg, Ungureanu, Victor, Roblek, Dominik

论文摘要

在本文中,我们提出了一个轻巧的模型,用于语音信号的频率带宽扩展,将采样频率从8kHz增加到16kHz,同时将高频含量恢复到与16kHz地面真相几乎无法区分的水平。该模型体系结构基于Seanet(声音增强网络),波浪与波浪完全卷积模型,该模型结合了特征损失和对抗性损失来重建输入语音的增强版本。此外,我们提出了一种可以在流媒体模式下在设备上部署的Seanet的变体,实现了16ms的架构延迟。当在移动CPU的单个核心上进行介绍时,处理一个16ms框架仅需1.5ms。低潜伏期使其可行对于双向语音通信系统。

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源