论文标题
对比性预测性编码支持分解的变异自动编码器,用于无监督的分散语音表示
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations
论文作者
论文摘要
在这项工作中,我们解决了语音信号中的样式和内容的解开。我们建议使用两个编码器的完全卷积的变性自动编码器:内容编码器和样式编码器。为了培养脱节,我们提出了对抗性对比预测编码。这种新的解开方法既不需要并行数据,也不需要任何监督。我们表明,与其他无监督方法相比,该提出的技术能够将扬声器和内容特征分开为两种不同的表示,并显示出具有竞争力的说话者 - 容易体内的分离性能。我们进一步证明,与光谱特征相比,在用于电话识别时,与火车测试不匹配的内容表示相比,内容表示的鲁棒性更高。
In this work we address disentanglement of style and content in speech signals. We propose a fully convolutional variational autoencoder employing two encoders: a content encoder and a style encoder. To foster disentanglement, we propose adversarial contrastive predictive coding. This new disentanglement method does neither need parallel data nor any supervision. We show that the proposed technique is capable of separating speaker and content traits into the two different representations and show competitive speaker-content disentanglement performance compared to other unsupervised approaches. We further demonstrate an increased robustness of the content representation against a train-test mismatch compared to spectral features, when used for phone recognition.