论文标题
猫 - 教派:通过受控的对抗文本生成改善NLP模型的鲁棒性
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
论文作者
论文摘要
NLP模型显示出稳健性问题,即,在对输入的小扰动下,模型的预测可以很容易地更改。在这项工作中,我们提出了一个受控的对抗文本生成(CAT-GEN)模型,鉴于输入文本,它通过可控属性生成对抗文本,这些属性已知任务标签是不变的。例如,为了在产品评论上攻击情感分类的模型,我们可以将产品类别用作可控属性,不会改变评论的情感。与许多现有的对抗性文本生成方法相比,现实世界中NLP数据集的实验表明,我们的方法可以产生更多样化和流利的对抗文本。我们进一步使用我们生成的对抗性示例来通过对抗训练来改善模型,并证明我们的生成攻击对模型重新训练和不同的模型体系结构更为强大。
NLP models are shown to suffer from robustness issues, i.e., a model's prediction can be easily changed under small perturbations to the input. In this work, we present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to task labels. For example, in order to attack a model for sentiment classification over product reviews, we can use the product categories as the controllable attribute which would not change the sentiment of the reviews. Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts, compared to many existing adversarial text generation approaches. We further use our generated adversarial examples to improve models through adversarial training, and we demonstrate that our generated attacks are more robust against model re-training and different model architectures.