对比度混合学习，以改进扬声器验证

论文标题

对比度混合学习，以改进扬声器验证

Contrastive-mixup learning for improved speaker verification

论文作者

Zhang, Xin, Jin, Minho, Cheng, Roger, Li, Ruirui, Han, Eunjung, Stolcke, Andreas

论文摘要

本文提出了一种新型的原型损失的表述，并进行了混合验证。混音是一种简单而有效的数据增强技术，它制造了随机数据点和标签对的加权组合，用于深度神经网络训练。由于其能够改善深度神经网络的鲁棒性和概括的能力，因此引起了人们越来越多的关注。尽管Mixup在不同域中表现出成功，但大多数应用程序都以封闭式分类任务为中心。在这项工作中，我们提出了对比度混合，这是一种新颖的增强策略，可以学习基于距离指标的表示形式。在训练过程中，混合操作会产生输入和虚拟标签的凸插值。此外，我们重新制定了原型损失函数，以便在公制学习目标上启用了混合功能。为了在培训数据有限的情况下证明其概括，我们通过改变Voxceleb数据库中每个说话者的可用话语数量来进行实验。实验结果表明，应用对比混合的表现优于现有基线，相对降低错误率，尤其是在每个说话者的训练话语数量有限时。

This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs for deep neural network training. Mixup has attracted increasing attention due to its ability to improve robustness and generalization of deep neural networks. Although mixup has shown success in diverse domains, most applications have centered around closed-set classification tasks. In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric. During training, mixup operations generate convex interpolations of both inputs and virtual labels. Moreover, we have reformulated the prototypical loss function such that mixup is enabled on metric learning objectives. To demonstrate its generalization given limited training data, we conduct experiments by varying the number of available utterances from each speaker in the VoxCeleb database. Experimental results show that applying contrastive-mixup outperforms the existing baseline, reducing error rate by 16% relatively, especially when the number of training utterances per speaker is limited.

下载PDF全文

下载文献需遵守相关版权规定

论文标题