论文标题
对抗性攻击和深层说话者认可系统的防御策略
Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems
论文作者
论文摘要
强大的演讲者认可,包括在恶意攻击的情况下,变得越来越重要和重要,尤其是由于几位智能扬声器和个人代理商的扩散,这些智能扬声器和个人代理人与个人的声音命令互动以执行多样化甚至敏感的任务。对抗性攻击是一个最近恢复过的域,它被证明可以有效地破坏基于神经网络的分类器,特别是通过仅通过少量扰动输入样本来迫使它们更改后验分布。尽管在计算机视觉领域已经取得了重大进展,但说话者识别的进步仍然有限。目前的说明性论文考虑了对深层说话者识别系统的几次最先进的对抗性攻击,采用强大的辩护方法作为对策,并报告了几项消融研究,以获得对问题的全面了解。实验表明,说话者识别系统容易受到对抗攻击的影响,最强的攻击可以将系统的准确性从94%降低到0%。该研究还详细比较了所采用的防御方法的性能,并发现基于预计梯度下降(PGD)的对抗性训练是我们环境中最好的防御方法。我们希望本文提出的实验提供基准,这对于有兴趣进一步研究说话者识别系统的对抗性鲁棒性的研究社区很有用。
Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of several smart speakers and personal agents that interact with an individual's voice commands to perform diverse, and even sensitive tasks. Adversarial attack is a recently revived domain which is shown to be effective in breaking deep neural network-based classifiers, specifically, by forcing them to change their posterior distribution by only perturbing the input samples by a very small amount. Although, significant progress in this realm has been made in the computer vision domain, advances within speaker recognition is still limited. The present expository paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures, and reporting on several ablation studies to obtain a comprehensive understanding of the problem. The experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%. The study also compares the performances of the employed defense methods in detail, and finds adversarial training based on Projected Gradient Descent (PGD) to be the best defense method in our setting. We hope that the experiments presented in this paper provide baselines that can be useful for the research community interested in further studying adversarial robustness of speaker recognition systems.