自然语言理解中的域分类的无参数连续学习

论文标题

自然语言理解中的域分类的无参数连续学习

Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding

论文作者

Hua, Ting, Shen, Yilin, Zhao, Changsheng, Hsu, Yen-Chang, Jin, Hongxia

论文摘要

域分类是自然语言理解（NLU）的基本任务，通常需要快速适应新的新兴领域。该约束使得不可能重新验证所有以前的域，即使新模型可以访问它们。大多数现有的持续学习方法的精度和性能波动都低，尤其是当旧数据和新数据的分布显着不同时。实际上，关键的现实世界问题不是没有旧数据，而是使用整个旧数据集对模型进行重新训练的效率低下。利用一些旧数据来产生高精度并保持稳定的性能，而同时又不引入额外的超参数？在本文中，我们为文本数据提出了一个无效的持续学习模型，该模型可以在各种环境下稳定地产生高性能。具体而言，我们利用Fisher信息选择可以“记录”原始模型的关键信息的示例。同样，提出了一种称为动态权重合并的新型方案，以在重新培训过程中实现无与伦比的学习。广泛的实验表明，基本线的性能波动波动，因此在实践中毫无用处。相反，我们提出的CCFI模型显着，并且始终如一地优于最佳最新方法的平均准确性高达20％，而CCFI的每个组成部分都对整体性能有效。

Domain classification is the fundamental task in natural language understanding (NLU), which often requires fast accommodation to new emerging domains. This constraint makes it impossible to retrain all previous domains, even if they are accessible to the new model. Most existing continual learning approaches suffer from low accuracy and performance fluctuation, especially when the distributions of old and new data are significantly different. In fact, the key real-world problem is not the absence of old data, but the inefficiency to retrain the model with the whole old dataset. Is it potential to utilize some old data to yield high accuracy and maintain stable performance, while at the same time, without introducing extra hyperparameters? In this paper, we proposed a hyperparameter-free continual learning model for text data that can stably produce high performance under various environments. Specifically, we utilize Fisher information to select exemplars that can "record" key information of the original model. Also, a novel scheme called dynamical weight consolidation is proposed to enable hyperparameter-free learning during the retrain process. Extensive experiments demonstrate that baselines suffer from fluctuated performance and therefore useless in practice. On the contrary, our proposed model CCFI significantly and consistently outperforms the best state-of-the-art method by up to 20% in average accuracy, and each component of CCFI contributes effectively to overall performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题