论文标题

我们可以在先前信任深刻的演讲吗?

Can We Trust Deep Speech Prior?

论文作者

Shi, Ying, Chen, Haolin, Tang, Zhiyuan, Li, Lantian, Wang, Dong, Han, Jiqing

论文摘要

最近,基于深度语音先验的语音增强(SE)引起了很多关注,例如具有非负矩阵分解(VAE-NMF)体系结构的变异自动编码器。与代表浅层模型(例如低级别协方差)等浅层模型的常规方法相比,这种新方法采用了深层的生成模型来代表干净的语音,这通常提供了更好的先验。尽管理论上具有明显的优势,但我们认为必须谨慎使用深度先验,因为深层生成模型产生的可能性并不总是与言语质量相吻合。我们设计了一项关于此问题的全面研究,并证明基于深层的语音先验,可以实现合理的SE绩效,但结果可能是最佳的。仔细的分析表明,这个问题深深植根于深层生成模型的柔韧性与最大似然性(ML)训练的性质之间的不和谐。

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture. Compared to conventional approaches that represent clean speech by shallow models such as Gaussians with a low-rank covariance, the new approach employs deep generative models to represent the clean speech, which often provides a better prior. Despite the clear advantage in theory, we argue that deep priors must be used with much caution, since the likelihood produced by a deep generative model does not always coincide with the speech quality. We designed a comprehensive study on this issue and demonstrated that based on deep speech priors, a reasonable SE performance can be achieved, but the results might be suboptimal. A careful analysis showed that this problem is deeply rooted in the disharmony between the flexibility of deep generative models and the nature of the maximum-likelihood (ML) training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源