论文标题
明智地选择时,您需要更多数据:一种通用样本效率的数据增强策略
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation
论文作者
论文摘要
已知数据增强(DA)可以提高深神经网络的普遍性。大多数现有的DA技术都天真地增加了一定数量的增强样品,而无需考虑这些样品的质量和增加的计算成本。为了解决这个问题,几种最先进的DA方法采用的一种共同策略是在培训过程中适应或重新定位增强样品。但是,这些自适应DA方法:(1)在计算上是昂贵的,而不是样品效率,(2)仅针对特定设置而设计。在这项工作中,我们提出了一种称为Glitter的通用技术,以克服这两个问题。可以将闪光插入任何DA方法中,从而在不牺牲性能的情况下训练样品效率。闪光从预先生成的增强样品池中自适应地选择了最大损失的最严重样本的子集,类似于对抗性DA。在不更改培训策略的情况下,可以在选定的子集上优化任务目标。我们在三种广泛使用的训练设置(包括一致性训练,自distillation和知识蒸馏)中进行了胶水基准,小队和Hellaswag的详尽实验,这表明闪光与强大的底线相比,闪光的训练速度更快,可以实现训练和实现竞争性能。
Data Augmentation (DA) is known to improve the generalizability of deep neural networks. Most existing DA techniques naively add a certain number of augmented samples without considering the quality and the added computational cost of these samples. To tackle this problem, a common strategy, adopted by several state-of-the-art DA methods, is to adaptively generate or re-weight augmented samples with respect to the task objective during training. However, these adaptive DA methods: (1) are computationally expensive and not sample-efficient, and (2) are designed merely for a specific setting. In this work, we present a universal DA technique, called Glitter, to overcome both issues. Glitter can be plugged into any DA method, making training sample-efficient without sacrificing performance. From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA. Without altering the training strategy, the task objective can be optimized on the selected subset. Our thorough experiments on the GLUE benchmark, SQuAD, and HellaSwag in three widely used training setups including consistency training, self-distillation and knowledge distillation reveal that Glitter is substantially faster to train and achieves a competitive performance, compared to strong baselines.