超越无遗忘：持续学习，落后知识转移

论文标题

超越无遗忘：持续学习，落后知识转移

Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer

论文作者

Lin, Sen, Yang, Li, Fan, Deliang, Zhang, Junshan

论文摘要

通过不断学习一系列任务，连续学习的代理（CL）可以分别利用前向知识转移和向后的知识转移来提高新任务和“旧任务”的学习绩效。但是，大多数现有的CL方法致力于通过最大程度地减少对旧任务的学习模型的修改来解决神经网络中的灾难性遗忘。这不可避免地会限制从新任务到旧任务的落后知识转移，因为明智的模型更新可能也可以提高旧任务的学习绩效。为了解决这个问题，我们首先根据对旧任务的输入子空间的梯度投影，首先分析更新旧任务模型可能对CL的条件可能是有益的，也可能导致落后知识转移。在理论分析的基础上，我们接下来开发了一种持续的学习方法，其中落后知识转移（CUBER），用于没有数据重播的固定容量神经网络。特别是，Cuber首先表征了任务相关性，以以层次的方式识别正相关的旧任务，然后在学习新任务时选择性修改旧任务的学习模型。实验研究表明，Cuber甚至可以首次在几个现有的基准测试上实现积极的向后知识转移，而无需数据重播，在这些基线中，相关基线仍然遭受了灾难性遗忘（负面知识传递）。 Cuber在向后知识转移方面的出色表现也相应地导致了更高的准确性。

By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and `old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies show that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题