信息从理论上推导信息，以鲁棒的机器学习应用程序进行最佳的对抗性攻击

论文标题

信息从理论上推导信息，以鲁棒的机器学习应用程序进行最佳的对抗性攻击

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

论文作者

Yi, Jirong, Mudumbai, Raghu, Xu, Weiyu

论文摘要

我们考虑在决策系统上设计最佳的对抗性攻击的理论问题，该决策系统可最大程度地降低系统的可实现性能，这是通过降级信号和感兴趣标签之间的相互信息来衡量的。这个问题是由于存在机器学习分类器的对抗性示例所激发的。通过采用信息理论观点，我们试图确定不可避免的对抗脆弱性的条件，即即使是最佳设计的分类器也将容易受到小型对抗性扰动的影响。我们介绍了最佳对抗性攻击的衍生物，以引起离散和连续感兴趣的信号，即找到最佳的扰动分布，以最大程度地减少降级信号和连续或离散分布后信号之间的相互信息。此外，我们表明，当有多个输入信号的多种冗余副本可用时，很难实现对抗性攻击，以最大程度地减少相互信息。这为最近提出的``特征压缩''假设提供了更多支持，以解释深度学习分类器的对抗性脆弱性。我们还报告了计算实验的结果，以说明我们的理论结果。

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题