FastSVC：快速跨域歌声转换与特征线性调制

论文标题

FastSVC：快速跨域歌声转换与特征线性调制

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

论文作者

Liu, Songxiang, Cao, Yuewen, Hu, Na, Su, Dan, Meng, Helen

论文摘要

本文介绍了FastSVC，这是一种轻巧的跨域唱歌语音转换（SVC）系统，可以实现高转换性能，推理速度比实时CPU快4倍。 FastSVC使用基于构象异构体的音素识别器从唱歌信号中提取歌手 - 敏捷语言特征。基于功能的线性调制发电机可直接从语言特征中综合波形，从而利用正弦兴奋信号和响度功能的信息。可以使用多分辨率光谱损失和对抗性损失来方便地训练波形生成器。实验结果表明，与计算重的基线系统相比，提出的FASTSVC系统可以在某些情况下实现可比的转换性能，并且在其他情况下的转换性能明显更好。此外，拟议的FASTSVC系统实现了理想的跨语性歌唱转换性能。 FastSVC系统的推理速度分别比GPU和CPU上的基线系统快3倍和70倍。

This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs. FastSVC uses Conformer-based phoneme recognizer to extract singer-agnostic linguistic features from singing signals. A feature-wise linear modulation based generator is used to synthesize waveform directly from linguistic features, leveraging information from sine-excitation signals and loudness features. The waveform generator can be trained conveniently using a multi-resolution spectral loss and an adversarial loss. Experimental results show that the proposed FastSVC system, compared with a computationally heavy baseline system, can achieve comparable conversion performance in some scenarios and significantly better conversion performance in other scenarios. Moreover, the proposed FastSVC system achieves desirable cross-lingual singing conversion performance. The inference speed of the FastSVC system is 3x and 70x faster than the baseline system on GPUs and CPUs, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题