远场演讲者验证挑战2022的Speakin Speaker验证系统

论文标题

远场演讲者验证挑战2022的Speakin Speaker验证系统

The SpeakIn Speaker Verification System for Far-Field Speaker Verification Challenge 2022

论文作者

Zheng, Yu, Peng, Jinghan, Chen, Yihao, Zhang, Yajun, Wang, Jialong, Liu, Min, Xu, Minqiang

论文摘要

本文介绍了Speakin Team提交的SPEAKER验证（SV）系统，该系统针对2022年远场演讲者验证挑战的任务2（FFSVC2022）。挑战的SV任务集中在完全监督的远场演讲者验证（任务1）和半监督远场扬声器验证（任务2）的问题上。在任务1中，我们将Voxceleb和FFSVC2020数据集用作火车数据集。对于任务2，我们仅将Voxceleb数据集用作火车集。为此挑战开发了基于重新连接和基于REPVGG的架构。全局统计池结构和MQMHA池结构用于跨时间汇总框架级特征，以获得说服级表示。我们采用了Am-Softmax和Aam-Softmax来对产生的嵌入进行分类。我们创新提出了一种分阶段的转移学习方法。在训练阶段，我们保留扬声器的权重，并且在此阶段没有积极的样本来训练它们。然后，我们在第二阶段用正样品和负样品微调这些权重。与传统的转移学习策略相比，该策略可以更好地提高模型性能。亚均值和标志的后端方法用于解决域不匹配的问题。在融合阶段，任务1中融合了三个模型，并且在任务2中融合了两个模型。在FFSVC2022排行榜上，我们提交的EER为3.0049％，在Task1中，相应的MindCF为0.2938。在Task2中，EER和MindCF分别为6.2060％和0.5232。我们的方法可以提高表现出色，并在两项挑战任务中排名第一。

This paper describes speaker verification (SV) systems submitted by the SpeakIn team to the Task 1 and Task 2 of the Far-Field Speaker Verification Challenge 2022 (FFSVC2022). SV tasks of the challenge focus on the problem of fully supervised far-field speaker verification (Task 1) and semi-supervised far-field speaker verification (Task 2). In Task 1, we used the VoxCeleb and FFSVC2020 datasets as train datasets. And for Task 2, we only used the VoxCeleb dataset as train set. The ResNet-based and RepVGG-based architectures were developed for this challenge. Global statistic pooling structure and MQMHA pooling structure were used to aggregate the frame-level features across time to obtain utterance-level representation. We adopted AM-Softmax and AAM-Softmax to classify the resulting embeddings. We innovatively propose a staged transfer learning method. In the pre-training stage we reserve the speaker weights, and there are no positive samples to train them in this stage. Then we fine-tune these weights with both positive and negative samples in the second stage. Compared with the traditional transfer learning strategy, this strategy can better improve the model performance. The Sub-Mean and AS-Norm backend methods were used to solve the problem of domain mismatch. In the fusion stage, three models were fused in Task1 and two models were fused in Task2. On the FFSVC2022 leaderboard, the EER of our submission is 3.0049% and the corresponding minDCF is 0.2938 in Task1. In Task2, EER and minDCF are 6.2060% and 0.5232 respectively. Our approach leads to excellent performance and ranks 1st in both challenge tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题