论文标题
基于正则化的持续学习的优化和概括:损失近似观点
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint
论文作者
论文摘要
神经网络在许多认知任务中取得了巨大的成功。但是,当他们在不访问旧数据的情况下依次训练它们,因此他们在早期任务上的表现往往会大大下降。这个问题通常被称为灾难性遗忘,这是对神经网络持续学习的关键挑战。基于正则化的方法是减轻灾难性遗忘的主要方法之一。在本文中,我们通过将其作为二阶泰勒的二阶泰勒近似的损失函数近似来提供了基于正则化的持续学习的新观点。这种观点导致一个统一的框架,可以实例化,以得出许多现有的算法,例如弹性重量巩固和Kronecker构成了拉普拉斯的近似。基于此观点,我们研究了基于正则化的持续学习的优化方面(即收敛)以及概括属性(即有限样本保证)。我们的理论结果表明,精确近似Hessian矩阵的重要性。几个基准的实验结果为我们的理论发现提供了经验验证。
Neural networks have achieved remarkable success in many cognitive tasks. However, when they are trained sequentially on multiple tasks without access to old data, their performance on early tasks tend to drop significantly. This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks. The regularization-based approach is one of the primary classes of methods to alleviate catastrophic forgetting. In this paper, we provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. This viewpoint leads to a unified framework that can be instantiated to derive many existing algorithms such as Elastic Weight Consolidation and Kronecker factored Laplace approximation. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning. Our theoretical results indicate the importance of accurate approximation of the Hessian matrix. The experimental results on several benchmarks provide empirical validation of our theoretical findings.