论文标题
扬声器验证和欺骗对策的后端合奏
Backend Ensemble for Speaker Verification and Spoofing Countermeasure
论文作者
论文摘要
本文介绍了提交给有意识的扬声器验证挑战2022的NPU系统。我们特别关注\ textit {后端集合},以供扬声器验证和欺骗三个方面的对策。首先,除了简单的串联外,我们还提出了循环矩阵变换,并堆叠扬声器嵌入和对策嵌入。随着新定义的循环嵌入的堆叠操作,我们几乎探讨了说话者嵌入和对策嵌入之间的所有可能相互作用。其次,我们尝试不同的卷积神经网络,以选择性地将嵌入的显着区域融合到使用卷积内核中。最后,我们在1D卷积神经网络中设计并行注意,以了解通道维度的全局相关性,并学习特征维度中的重要部分。同时,我们将挤压和激发注意力嵌入2D卷积神经网络中,以了解说话者嵌入和对策嵌入之间的全球依赖性。实验结果表明上述所有方法都是有效的。在通过上述方法增强了四个训练有素的模型后,我们实现的最佳SASV-EER,SPF-EER和SV-EER分别为0.559 \%,0.354 \%\%和0.857 \%,评估集中分别为0.857 \%。与上述贡献一起,我们的提交系统在这一挑战中获得了第五名。
This paper describes the NPU system submitted to Spoofing Aware Speaker Verification Challenge 2022. We particularly focus on the \textit{backend ensemble} for speaker verification and spoofing countermeasure from three aspects. Firstly, besides simple concatenation, we propose circulant matrix transformation and stacking for speaker embeddings and countermeasure embeddings. With the stacking operation of newly-defined circulant embeddings, we almost explore all the possible interactions between speaker embeddings and countermeasure embeddings. Secondly, we attempt different convolution neural networks to selectively fuse the embeddings' salient regions into channels with convolution kernels. Finally, we design parallel attention in 1D convolution neural networks to learn the global correlation in channel dimensions as well as to learn the important parts in feature dimensions. Meanwhile, we embed squeeze-and-excitation attention in 2D convolutional neural networks to learn the global dependence among speaker embeddings and countermeasure embeddings. Experimental results demonstrate that all the above methods are effective. After fusion of four well-trained models enhanced by the mentioned methods, the best SASV-EER, SPF-EER and SV-EER we achieve are 0.559\%, 0.354\% and 0.857\% on the evaluation set respectively. Together with the above contributions, our submission system achieves the fifth place in this challenge.