论文标题
通过对比平衡学习,无监督的代表性学习对说话者的识别
Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning
论文作者
论文摘要
在本文中,我们提出了一种简单但功能强大的无监督学习方法,用于说话者识别,即对比度平衡学习(CEL),从而增加了通过使用统一性损失来增加嵌入中潜在的滋扰因素的不确定性。同样,为了保护说话者的可区分性,使用对比的相似性损失函数。实验结果表明,所提出的CEL在Voxceleb1和声音评估集上分别明显胜过最先进的无监督说话者验证系统和最佳性能模型。最重要的是,通过CEL预先训练的初始参数训练的监督说话者嵌入网络的性能比接受随机初始化参数的培训的性能更好。
In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the proposed CEL significantly outperforms the state-of-the-art unsupervised speaker verification systems and the best performing model achieved 8.01% and 4.01% EER on VoxCeleb1 and VOiCES evaluation sets, respectively. On top of that, the performance of the supervised speaker embedding networks trained with initial parameters pre-trained via CEL showed better performance than those trained with randomly initialized parameters.