论文标题

非lipschitz网络鲁棒性的分析

An Analysis of Robustness of Non-Lipschitz Networks

论文作者

Balcan, Maria-Florina, Blum, Avrim, Sharma, Dravyansh, Zhang, Hongyang

论文摘要

尽管有重大进展,但深层网络仍然很容易受到对抗攻击的影响。一个基本的挑战是,小型输入扰动通常可以在网络的最终特征空间中产生大型运动。在本文中,我们定义了一个攻击模型,该攻击模型抽象这一挑战,以帮助了解其内在特性。在我们的模型中,对手可能会在特征空间中移动数据一个任意距离,但仅在随机的低维子空间中。我们证明,这种对手可以非常有力:击败必须对其给出的任何输入进行分类的任何算法。但是,通过允许算法弃用异常输入,我们表明当类在特征空间中合理地分离时,可以克服这种对手。我们进一步提供了强大的理论保证,用于设置算法参数,以使用数据驱动的方法优化准确性 - 规定的权衡。我们的结果为最近的邻居风格算法提供了新的鲁棒性保证,并且还具有对比度学习的应用,我们从经验上证明了这种算法以低弃用速率获得高鲁棒精度的能力。我们的模型还以战略分类为动机,在该战略分类中,被分类的实体旨在操纵其可观察的功能以产生首选分类,我们也为该领域提供了新的见解。

Despite significant advances, deep networks remain highly susceptible to adversarial attack. One fundamental challenge is that small input perturbations can often produce large movements in the network's final-layer feature space. In this paper, we define an attack model that abstracts this challenge, to help understand its intrinsic properties. In our model, the adversary may move data an arbitrary distance in feature space but only in random low-dimensional subspaces. We prove such adversaries can be quite powerful: defeating any algorithm that must classify any input it is given. However, by allowing the algorithm to abstain on unusual inputs, we show such adversaries can be overcome when classes are reasonably well-separated in feature space. We further provide strong theoretical guarantees for setting algorithm parameters to optimize over accuracy-abstention trade-offs using data-driven methods. Our results provide new robustness guarantees for nearest-neighbor style algorithms, and also have application to contrastive learning, where we empirically demonstrate the ability of such algorithms to obtain high robust accuracy with low abstention rates. Our model is also motivated by strategic classification, where entities being classified aim to manipulate their observable features to produce a preferred classification, and we provide new insights into that area as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源