通过多语言自适应微调调整预训练的语言模型到非洲语言

论文标题

通过多语言自适应微调调整预训练的语言模型到非洲语言

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

论文作者

Alabi, Jesujoba O., Adelani, David Ifeoluwa, Mosbach, Marius, Klakow, Dietrich

论文摘要

多语言预训练的语言模型（PLM）在高资源和低资源的语言的几个下游任务上表现出令人印象深刻的表现。但是，在培训期间，尤其是非洲语言中，看不见的语言仍然有很大的表现。适应新语言的最有效方法之一是\ textit {语言自适应微调}（LAFT） - 使用预训练目标对单语言的多语言PLM进行微调。但是，适应目标语言会单独使用大型磁盘空间，并限制了由此产生的模型的跨语言转移能力，因为它们已经专门用于单语言。在本文中，我们对17种最资源的非洲语言和其他三种在非洲大陆上广泛使用的高资源语言对17种最丰富的非洲语言进行了\ textit {多语言自适应微调}，以鼓励跨语性转移学习。为了进一步专注于多语言PLM，我们从嵌入式层中删除了与MAFT之前的非非洲编写脚本相对应的词汇令牌，从而将模型大小降低了约50％。我们对两个多语言PLM（Afriberta和XLM-R）和三个NLP任务（NER，新闻主题分类和情感分类）的评估表明，我们的方法在需要较小的磁盘空间的同时将LAFT应用于单个语言上是有竞争力的。此外，我们表明我们的适应性PLM还提高了参数有效微调方法的零击跨语性转移能力。

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is \textit{language adaptive fine-tuning} (LAFT) -- fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to a target language individually takes a large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform \textit{multilingual adaptive fine-tuning} on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50%. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题