论文标题
小型100:介绍低资源语言的浅层多语言机器翻译模型
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
论文作者
论文摘要
近年来,多语言的机器翻译模型通过在类似语言之间共享信息,从而在低资源语言对上实现了有希望的性能,从而实现了零拍的翻译。为了克服“多语言的诅咒”,这些模型通常选择扩大参数的数量,这使它们在资源约束环境中的使用具有挑战性。我们介绍了小型100,这是M2M-100(12B)型号的蒸馏版,这是一种涵盖100种语言的大型多语言机器翻译模型。我们在所有语言对中训练小型100均匀采样,因此着重于保持低资源语言的性能。我们在不同的低资源基准上评估了Small-100:Flores-101,Tatoeba和Tico-19,并证明它的表现优于以前的大型多语言模型(200-600m),同时改善推论潜伏期和记忆使用。此外,我们的模型可实现与M2M-100(1.2b)的可比结果,而推断时较小3.6倍,4.3倍。代码和预培训模型:https://github.com/alirezamshi/small100
In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and pre-trained models: https://github.com/alirezamshi/small100