小型100：介绍低资源语言的浅层多语言机器翻译模型

论文标题

小型100：介绍低资源语言的浅层多语言机器翻译模型

SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

论文作者

Mohammadshahi, Alireza, Nikoulina, Vassilina, Berard, Alexandre, Brun, Caroline, Henderson, James, Besacier, Laurent

论文摘要

近年来，多语言的机器翻译模型通过在类似语言之间共享信息，从而在低资源语言对上实现了有希望的性能，从而实现了零拍的翻译。为了克服“多语言的诅咒”，这些模型通常选择扩大参数的数量，这使它们在资源约束环境中的使用具有挑战性。我们介绍了小型100，这是M2M-100（12B）型号的蒸馏版，这是一种涵盖100种语言的大型多语言机器翻译模型。我们在所有语言对中训练小型100均匀采样，因此着重于保持低资源语言的性能。我们在不同的低资源基准上评估了Small-100：Flores-101，Tatoeba和Tico-19，并证明它的表现优于以前的大型多语言模型（200-600m），同时改善推论潜伏期和记忆使用。此外，我们的模型可实现与M2M-100（1.2b）的可比结果，而推断时较小3.6倍，4.3倍。代码和预培训模型：https：//github.com/alirezamshi/small100

In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and pre-trained models: https://github.com/alirezamshi/small100

下载PDF全文

下载文献需遵守相关版权规定

论文标题