论文标题
反事实数据增加中的偏见挑战
Bias Challenges in Counterfactual Data Augmentation
论文作者
论文摘要
深度学习模型往往不是由于依赖虚假特征来解决任务的依赖而不是分布的。反事实数据增强提供了一种(大约)实现伪造特征反事实的表示形式的一般方法,这是对分布(OOD)鲁棒性的要求。在这项工作中,我们表明,如果通过上下文猜测计算机执行增强,则反事实数据扩展可能无法实现所需的反事实不变性。我们理论上分析了这种反事实数据增强所施加的不变性,并描述了一个示例性NLP任务,其中通过上下文猜测机器的反事实数据增强并不会导致强大的OOD分类器。
Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a context-guessing machine, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.