论文标题
逆向工程$ \ ell_p $攻击:带有恢复保证的块 - sparse优化方法
Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees
论文作者
论文摘要
已显示基于神经网络的深度网络分类器容易受到对其输入的不可察觉的扰动,例如$ \ ell_p $ bunded norm norm对抗性攻击。这促使许多防御方法的发展,然后被新的攻击打破,依此类推。本文重点介绍了逆向对抗性攻击的不同但相关的问题。具体而言,给定攻击信号,我们研究了可以确定攻击类型的条件($ \ ell_1 $,$ \ ell_2 $或$ \ ell_ \ infty $)并恢复清洁信号。我们将这个问题作为一个障碍物恢复问题,在该问题中,信号和攻击都位于一个子空间中,其中包括每个类别一个子空间和每个攻击类型的一个子空间。我们在子空间上得出几何条件,在该子空间下,任何攻击信号都可以分解为清洁信号的总和加上攻击。另外,通过确定包含信号和攻击的子空间,我们还可以对信号进行分类并确定攻击类型。关于数字和面部分类的实验证明了所提出的方法的有效性。
Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as $\ell_p$-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.