论文标题
关于神经网络不变的好处
On the Benefits of Invariance in Neural Networks
论文作者
论文摘要
许多现实世界数据分析问题表现出不变的结构,并且利用这种结构的模型表现出令人印象深刻的经验表现,尤其是在深度学习中。尽管文献包含多种将不变性纳入模型的方法,但理论理解却很差,并且无法评估何时应比另一种方法更喜欢一种方法。在这项工作中,我们分析了在不变性存在下深度学习中两种广泛使用方法的益处和局限性:数据增强和特征平均。我们证明,使用数据增强的培训可以更好地估计风险和其梯度,并为培训数据增强的模型提供了PAC-Bayes的概括。我们还表明,与数据增强相比,特征平均降低了与凸丢失一起使用并收紧pac-bayes边界的概括误差。我们提供了这些理论结果的经验支持,包括证明了为什么概括可能不会通过数据增强来改善:“学习不变性”在培训分布之外失败。
Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of methods to incorporate invariance into models, theoretical understanding is poor and there is no way to assess when one method should be preferred over another. In this work, we analyze the benefits and limitations of two widely used approaches in deep learning in the presence of invariance: data augmentation and feature averaging. We prove that training with data augmentation leads to better estimates of risk and gradients thereof, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds. We provide empirical support of these theoretical results, including a demonstration of why generalization may not improve by training with data augmentation: the `learned invariance' fails outside of the training distribution.