通过贝叶斯神经建模来缓解封闭模型的对抗示例，以增强端到端语音识别

论文标题

通过贝叶斯神经建模来缓解封闭模型的对抗示例，以增强端到端语音识别

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition

论文作者

Yang, Chao-Han Huck, Ahmed, Zeeshan, Gu, Yile, Szurley, Joseph, Ren, Roger, Liu, Linda, Stolcke, Andreas, Bulyko, Ivan

论文摘要

在这项工作中，我们旨在增强端到端自动语音识别（ASR）的稳健性（ASR）针对对抗性的语音示例。我们专注于严格而经验的“封闭模式对抗性鲁棒性”设置（例如，设备或云应用程序）。对抗噪声仅通过封闭模型优化（例如进化和零阶估计）生成，而无需直接访问目标ASR模型的梯度信息。我们提出了一个基于晚期的贝叶斯神经网络（BNN）的对抗检测器，该探测器可以通过以差异测量对潜在的分布来建模潜在分布。我们进一步模拟了使用拟议的对抗检测系统的RNN传感器，构象异构体和WAV2VEC-2.0的部署方案。利用拟议的基于BNN的检测系统，我们将检测率提高+2.77至 +5.42％（相对+3.03至 +6.26％），并将Librispeech数据集的单词错误率降低了5.02至7.47％，与当前模型增强方法相比针对对抗性语音示例。

In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without accessing gradient information of a targeted ASR model directly. We propose an advanced Bayesian neural network (BNN) based adversarial detector, which could model latent distributions against adaptive adversarial perturbation with divergence measurement. We further simulate deployment scenarios of RNN Transducer, Conformer, and wav2vec-2.0 based ASR systems with the proposed adversarial detection system. Leveraging the proposed BNN based detection system, we improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets compared to the current model enhancement methods against the adversarial speech examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题