通过合并语言模型的权重来数据素材知识融合

论文标题

通过合并语言模型的权重来数据素材知识融合

Dataless Knowledge Fusion by Merging Weights of Language Models

论文作者

Jin, Xisen, Ren, Xiang, Preotiuc-Pietro, Daniel, Cheng, Pengxiang

论文摘要

微调预训练的语言模型已成为构建下游NLP模型的普遍范式。通常很容易获得微调模型，但由于数据隐私或知识产权的关注，它们的培训数据不是。这为在各个模型中融合知识以产生更好的单个模型构成了障碍。在本文中，我们研究了合并基于不同培训数据集的单个模型的问题，以获得一个单个模型，该模型在所有数据集域中都表现良好，并且可以推广到室外数据。我们提出了一种数据质量知识融合方法，该方法将模型在其参数空间中合并，并在权重的指导下最小化了合并模型与单个模型之间的预测差异。在一系列评估设置上，我们表明所提出的方法显着优于基线，例如Fisher加权平均或模型结合。此外，我们发现我们的方法是多任务学习的有希望的替代方法，可以在不访问培训数据的情况下对单个模型进行维护或有时改进。最后，模型合并比训练多任务模型更有效，因此使其适用于更广泛的方案。

Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题