语音信号的深度生成分解

论文标题

语音信号的深度生成分解

Deep generative factorization for speech signal

论文作者

Sun, Haoran, Li, Lantian, Cai, Yunqi, Zhang, Yang, Zheng, Thomas Fang, Wang, Dong

论文摘要

语音信号混合了各种信息因素，这是大多数语音信息处理任务的主要困难。一个直观的想法是将语音信号分配到个体信息因素（例如语音内容和说话者特征）中，尽管事实证明这是高度挑战的。本文提出了一种基于新的阶乘判别归一化流程模型（阶乘DNF）的语音分解方法。在涉及语音含量和说话者特征的两因素情况下进行的实验表明，提出的阶乘DNF具有强大的能力，可以在信息表示和操纵方面对语音信号进行分解和胜过几个比较模型。

Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks. An intuitive idea is to factorize speech signal into individual information factors (e.g., phonetic content and speaker trait), though it turns out to be highly challenging. This paper presents a speech factorization approach based on a novel factorial discriminative normalization flow model (factorial DNF). Experiments conducted on a two-factor case that involves phonetic content and speaker trait demonstrates that the proposed factorial DNF has powerful capability to factorize speech signals and outperforms several comparative models in terms of information representation and manipulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题