论文标题

检查机器翻译的语言模型体系结构的缩放和传输

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

论文作者

Zhang, Biao, Ghorbani, Behrooz, Bapna, Ankur, Cheng, Yong, Garcia, Xavier, Shen, Jonathan, Firat, Orhan

论文摘要

自然的语言理解和生成模型遵循两个主要的体系结构范式之一:语言模型(LMS),该模型(LMS)在单个层堆栈中处理串联序列,以及使用单独的层堆栈进行输入和输出处理的编码器模型(ENCDEC)。在机器翻译中,ENCDEC长期以来一直是受欢迎的方法,但是很少有研究研究LMS的性能。在这项工作中,我们彻底研究了在数据条件和模型尺寸的系统变化下,几种架构设计选择对LMS的性能在双语(大规模)多语言和零弹性翻译任务中的作用。我们的结果表明:(i)不同的LM具有不同的缩放属性,在这些范围中,架构差异通常会对模型性能产生重大影响,但随着参数数量的增加,性能差距会狭窄,(ii)几种设计选择,包括因果掩模和语言模型的源序列,对源序列的效果,对质量的质量造成了质量,并在III和III时具有较高的效果(III和III)。与ENCDEC执行有关受监督的双语和多语言翻译任务的执行,并通过促进减少靶向靶向翻译的方式在零射击方向上进行大大改进。

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源