论文标题
参数有效的转移学习何时用于机器翻译?
When does Parameter-Efficient Transfer Learning Work for Machine Translation?
论文作者
论文摘要
参数有效的微调方法(PEFT)提供了调整大型预训练模型的希望,而仅调整少量参数。对于许多下游任务而言,它们已被证明具有完整模型的竞争力。但是,先前的工作表明,PEFT可能对机器翻译(MT)不行,并且没有全面的研究表明PEFTS何时为MT工作。考虑到(1)各种参数预算,(2)多种语言对以及(3)不同的预训练模型,我们对MT进行了PEFT的全面实证研究。我们发现,当参数预算对应于总模型参数的10%时,在每一层之后添加小馈电网络的“适配器”确实与完整的模型进行了微调。然而,随着调谐参数的数量减少,PEFT的性能降低。这种降低的大小取决于语言对,PEFT特别努力地为遥远的语言对挣扎。我们发现,使用具有较大预训练的模型的PEFT优于较小模型的全面微调,而对于较小的训练数据尺寸,PEFT的表现优于同一预训练模型的全面微调。
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models. We find that 'adapters', in which small feed-forward networks are added after every layer, are indeed on par with full model fine-tuning when the parameter budget corresponds to 10% of total model parameters. Nevertheless, as the number of tuned parameters decreases, the performance of PEFTs decreases. The magnitude of this decrease depends on the language pair, with PEFTs particularly struggling for distantly related language-pairs. We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.