关于数据增强和对抗风险：经验分析

论文标题

关于数据增强和对抗风险：经验分析

On Data Augmentation and Adversarial Risk: An Empirical Analysis

论文作者

Eghbal-zadeh, Hamid, Koutini, Khaled, Primus, Paul, Haunschmid, Verena, Lewandowski, Michal, Zellinger, Werner, Moser, Bernhard A., Widmer, Gerhard

论文摘要

数据增强技术已成为深度学习的标准实践，因为它已证明可以大大提高模型的概括能力。这些技术依赖于不同的想法，例如保留不变性的转换（例如，专家定义的增强），统计启发式方法（例如，混合）和学习数据分布（例如gans）。但是，在对抗性环境中，在哪些条件下仍不清楚这些数据增强方法降低甚至恶化了错误分类的风险。因此，在本文中，我们通过三个措施分析了不同数据增强技术对对抗风险的影响：（a）基于Laplacian操作员的对抗性攻击下的众所周知的风险，（b）一种新的预测压力，以及（c）培训示例对预测的影响。我们的经验分析结果反驳了以下假设：数据增强引起的分类绩效的改善始终伴随着对抗性攻击下风险的改善。此外，我们的结果表明，增强数据对结果模型的影响要比非官能数据更大。综上所述，我们的结果表明，必须谨慎地应用数据和任务的特征和任务的特征。

Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models. These techniques rely on different ideas such as invariance-preserving transformations (e.g, expert-defined augmentation), statistical heuristics (e.g, Mixup), and learning the data distribution (e.g, GANs). However, in the adversarial settings it remains unclear under what conditions such data augmentation methods reduce or even worsen the misclassification risk. In this paper, we therefore analyse the effect of different data augmentation techniques on the adversarial risk by three measures: (a) the well-known risk under adversarial attacks, (b) a new measure of prediction-change stress based on the Laplacian operator, and (c) the influence of training examples on prediction. The results of our empirical analysis disprove the hypothesis that an improvement in the classification performance induced by a data augmentation is always accompanied by an improvement in the risk under adversarial attack. Further, our results reveal that the augmented data has more influence than the non-augmented data, on the resulting models. Taken together, our results suggest that general-purpose data augmentations that do not take into the account the characteristics of the data and the task, must be applied with care.

下载PDF全文

下载文献需遵守相关版权规定

论文标题