了解和提高弗兰克·沃尔夫对抗训练的效率

论文标题

了解和提高弗兰克·沃尔夫对抗训练的效率

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

论文作者

Tsiligkaridis, Theodoros, Roberts, Jay

论文摘要

深层神经网络很容易被称为对抗性攻击的小扰动所欺骗。对抗训练（AT）是一种近似解决强大的优化问题以最大程度地减少最严重的损失的技术，并被广泛认为是最有效的防御。由于在AT过程中产生强大的对抗示例的计算时间很高，因此提出了单步方法来减少训练时间。但是，这些方法遭受了灾难性的过度拟合，在训练过程中对抗准确性下降，尽管提出了改进，但它们增加了训练时间和鲁棒性远非多步骤的训练时间。我们通过FW优化（FW-AT）开发了一个理论框架，该框架揭示了损失景观与$ \ ell_2 $损坏之间的几何连接，$ \ ell_ \ ell_ \ elfty $ FW攻击。我们在分析上表明，FW攻击的高变形等同于沿攻击路径的小梯度变化。然后，在各种深层神经网络体系结构上实验证明了$ \ ell_ \ infty $攻击对强大模型的攻击接近最大变形，而标准网络的失真较低。实验表明，灾难性的过度拟合与FW攻击的低失真密切相关。这种数学透明度将FW与投影梯度下降（PGD）优化区分开。为了证明我们的理论框架的实用性，我们开发了一种新型的对抗性训练算法，该算法使用简单的失真度量来适应训练过程中攻击步骤的数量，以提高效率而不损害鲁棒性。 FW-AT-ADAPT在方法中提供了与单步快速相同的训练时间，并在方法和多步pgd-at中缩小了快速的差距，而在白色盒和黑盒设置中，对抗性准确性的损失最少。

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense. Due to the high computation time for generating strong adversarial examples in the AT process, single-step approaches have been proposed to reduce training time. However, these methods suffer from catastrophic overfitting where adversarial accuracy drops during training, and although improvements have been proposed, they increase training time and robustness is far from that of multi-step AT. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the $\ell_2$ distortion of $\ell_\infty$ FW attacks. We analytically show that high distortion of FW attacks is equivalent to small gradient variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal distortion, while standard networks have lower distortion. It is experimentally shown that catastrophic overfitting is strongly correlated with low distortion of FW attacks. This mathematical transparency differentiates FW from Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-AT-Adapt, a novel adversarial training algorithm which uses a simple distortion measure to adapt the number of attack steps during training to increase efficiency without compromising robustness. FW-AT-Adapt provides training time on par with single-step fast AT methods and closes the gap between fast AT methods and multi-step PGD-AT with minimal loss in adversarial accuracy in white-box and black-box settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题