我们可以在先前信任深刻的演讲吗？

论文标题

我们可以在先前信任深刻的演讲吗？

Can We Trust Deep Speech Prior?

论文作者

Shi, Ying, Chen, Haolin, Tang, Zhiyuan, Li, Lantian, Wang, Dong, Han, Jiqing

论文摘要

最近，基于深度语音先验的语音增强（SE）引起了很多关注，例如具有非负矩阵分解（VAE-NMF）体系结构的变异自动编码器。与代表浅层模型（例如低级别协方差）等浅层模型的常规方法相比，这种新方法采用了深层的生成模型来代表干净的语音，这通常提供了更好的先验。尽管理论上具有明显的优势，但我们认为必须谨慎使用深度先验，因为深层生成模型产生的可能性并不总是与言语质量相吻合。我们设计了一项关于此问题的全面研究，并证明基于深层的语音先验，可以实现合理的SE绩效，但结果可能是最佳的。仔细的分析表明，这个问题深深植根于深层生成模型的柔韧性与最大似然性（ML）训练的性质之间的不和谐。

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture. Compared to conventional approaches that represent clean speech by shallow models such as Gaussians with a low-rank covariance, the new approach employs deep generative models to represent the clean speech, which often provides a better prior. Despite the clear advantage in theory, we argue that deep priors must be used with much caution, since the likelihood produced by a deep generative model does not always coincide with the speech quality. We designed a comprehensive study on this issue and demonstrated that based on deep speech priors, a reasonable SE performance can be achieved, but the results might be suboptimal. A careful analysis showed that this problem is deeply rooted in the disharmony between the flexibility of deep generative models and the nature of the maximum-likelihood (ML) training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题