论文标题
几次射击机器翻译的紧急通信预处理
Emergent Communication Pretraining for Few-Shot Machine Translation
论文作者
论文摘要
尽管依赖大量多语言的编码器的最新模型在下游应用中达到了样本效率,但它们仍然需要大量的未标记文本。然而,世界上大多数语言都缺乏这样的资源。因此,在没有语言数据的情况下,我们研究了一种更激进的知识转移形式。特别是,我们第一次通过引用游戏的紧急通信为神经网络预授予了验证。我们的关键假设是,将图像进行沟通 - 作为现实世界环境的粗略近似 - - 感应将模型偏向学习自然语言。一方面,我们证明了这种基本上有益于机器翻译,以几次射击设置。另一方面,这还提供了一个外部评估协议,以探测Excrent语言的特性。从直觉上讲,它们与自然语言的距离越近,预处理的收益就越高。例如,在这项工作中,我们衡量沟通成功和最大序列长度对下游性能的影响。最后,我们在微调过程中为最大A-Posteriori推断的正规分析师引入了定制的适配器层和退火策略。这些事实对促进知识转移和防止灾难性遗忘至关重要。与经常性的基线相比,我们的方法的收益为$ 59.0 \%$$ \ sim $$ 147.6 \%$ in BLEU得分,仅$ 500 $ NMT培训实例,$ 65.1 \%$$ \%$ \ sim $$ 196.7 \%196.7 \%$ and $ 1,000 $ NMT $ NMT培训实例。这些概念验证的结果揭示了在资源贫乏的环境中进行自然语言处理任务的新兴沟通和人工语言外部评估的潜力。
While state-of-the-art models that rely upon massively multilingual pretrained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Nevertheless, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in the absence of linguistic data. In particular, for the first time we pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages. On the one hand, we show that this substantially benefits machine translation in few-shot settings. On the other hand, this also provides an extrinsic evaluation protocol to probe the properties of emergent languages ex vitro. Intuitively, the closer they are to natural languages, the higher the gains from pretraining on them should be. For instance, in this work we measure the influence of communication success and maximum sequence length on downstream performances. Finally, we introduce a customised adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. These turn out to be crucial to facilitate knowledge transfer and prevent catastrophic forgetting. Compared to a recurrent baseline, our method yields gains of $59.0\%$$\sim$$147.6\%$ in BLEU score with only $500$ NMT training instances and $65.1\%$$\sim$$196.7\%$ with $1,000$ NMT training instances across four language pairs. These proof-of-concept results reveal the potential of emergent communication pretraining for both natural language processing tasks in resource-poor settings and extrinsic evaluation of artificial languages.