使用SE-RES2NET-CONFORNERTERTURE的合成语音检测和音频剪接检测

论文标题

使用SE-RES2NET-CONFORNERTERTURE的合成语音检测和音频剪接检测

Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture

论文作者

Wang, Lei, Yeoh, Benedict, Ng, Jun Wah

论文摘要

合成语音和拼接音频剪辑已被生成以欺骗互联网用户和人工智能（AI）技术，例如语音身份验证。现有的研究工作将欺骗对策视为二元分类问题：善意与欺骗。本文通过涉及最近的构象构块来进一步利用声学特征的局部模式来扩展现有的RES2NET。 ASVSPOOF 2019数据库的实验结果表明，所提出的SE-RES2NET-CONFORMENTERTERTER能够改善逻辑访问方案的欺骗对策性能。此外，本文还建议重新构建现有的音频剪接检测问题。与其识别完整的剪接段，不如检测剪接段的边界更有用。此外，可以使用深度学习方法来解决问题，这与以前的信号处理技术不同。

Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary classification problem: bonafide vs. spoof. This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features. Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance for the logical access scenario. In addition, this paper also proposes to re-formulate the existing audio splicing detection problem. Instead of identifying the complete splicing segments, it is more useful to detect the boundaries of the spliced segments. Moreover, a deep learning approach can be used to solve the problem, which is different from the previous signal processing techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题