通过多级融合进行欺骗意识的扬声器验证

论文标题

通过多级融合进行欺骗意识的扬声器验证

Spoofing-Aware Speaker Verification by Multi-Level Fusion

论文作者

Wu, Haibin, Meng, Lingwei, Kang, Jiawen, Li, Jinchao, Li, Xu, Wu, Xixin, Lee, Hung-yi, Meng, Helen

论文摘要

Recently, many novel techniques have been introduced to deal with spoofing attacks, and achieve promising countermeasure (CM) performances.但是，这些作品只考虑了独立的CM模型。如今，旨在促进集成CM和ASV模型的研究旨在促进研究的欺骗性扬声器验证（SASV）挑战，认为共同优化CM和ASV模型将导致更好的性能。在本文中，我们提出了一种新型的多模型和多级融合策略来解决SASV任务。与纯评分融合和嵌入融合方法相比，该框架首先利用CM模型的嵌入，将CM嵌入到CM块中以获得CM分数。在第二级融合中，直接从ASV系统的CM分数和ASV分数将被串联为最终决定的预测块。结果，最佳的单一融合系统在评估集中达到了0.97％的SASV-EER。然后，通过结束前5个融合系统，最终的SASV-EER达到了0.89％。

Recently, many novel techniques have been introduced to deal with spoofing attacks, and achieve promising countermeasure (CM) performances. However, these works only take the stand-alone CM models into account. Nowadays, a spoofing aware speaker verification (SASV) challenge which aims to facilitate the research of integrated CM and ASV models, arguing that jointly optimizing CM and ASV models will lead to better performance, is taking place. In this paper, we propose a novel multi-model and multi-level fusion strategy to tackle the SASV task. Compared with purely scoring fusion and embedding fusion methods, this framework first utilizes embeddings from CM models, propagating CM embeddings into a CM block to obtain a CM score. In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision. As a result, the best single fusion system has achieved the SASV-EER of 0.97% on the evaluation set. Then by ensembling the top-5 fusion systems, the final SASV-EER reached 0.89%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题