论文标题

通过多尺度知识蒸馏和数据增强的元学习更新在线学习

Online Continual Learning via the Meta-learning Update with Multi-scale Knowledge Distillation and Data Augmentation

论文作者

Han, Ya-nan, Liu, Jian-wei

论文摘要

持续学习旨在快速,不断地从一系列任务中学习当前的任务。 与其他类型的方法相比,基于经验重播的方法表现出了极大的优势来克服灾难性的遗忘。该方法的一个常见局限性是上一个任务和当前任务之间的数据不平衡,这将进一步加剧遗忘。此外,如何在这种情况下有效解决稳定性困境也是一个紧迫的问题。在本文中,我们通过提出一个通过多尺度知识蒸馏和数据增强(MMKDDA)提出一个称为元学习更新的新型框架来克服这些挑战。具体而言,我们应用多尺度知识蒸馏来掌握不同特征级别的远程和短期空间关系的演变,以减轻数据不平衡问题。此外,我们的方法在在线持续训练程序中混合了情节内存和当前任务的样本,从而减轻了由于概率分布的变化而减轻了侧面影响。此外,我们通过元学习更新来优化我们的模型,该更新诉诸于前面所看到的任务数量,这有助于保持稳定性和可塑性之间的更好平衡。最后,我们在四个基准数据集上进行的实验评估显示了提出的MMKDDA框架对其他流行基线的有效性,并且还进行了消融研究,以进一步分析每个组件在我们的框架中的作用。

Continual learning aims to rapidly and continually learn the current task from a sequence of tasks. Compared to other kinds of methods, the methods based on experience replay have shown great advantages to overcome catastrophic forgetting. One common limitation of this method is the data imbalance between the previous and current tasks, which would further aggravate forgetting. Moreover, how to effectively address the stability-plasticity dilemma in this setting is also an urgent problem to be solved. In this paper, we overcome these challenges by proposing a novel framework called Meta-learning update via Multi-scale Knowledge Distillation and Data Augmentation (MMKDDA). Specifically, we apply multiscale knowledge distillation to grasp the evolution of long-range and short-range spatial relationships at different feature levels to alleviate the problem of data imbalance. Besides, our method mixes the samples from the episodic memory and current task in the online continual training procedure, thus alleviating the side influence due to the change of probability distribution. Moreover, we optimize our model via the meta-learning update resorting to the number of tasks seen previously, which is helpful to keep a better balance between stability and plasticity. Finally, our experimental evaluation on four benchmark datasets shows the effectiveness of the proposed MMKDDA framework against other popular baselines, and ablation studies are also conducted to further analyze the role of each component in our framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源