论文标题
在不平衡的培训数据方案中无监督神经机器翻译的自我训练
Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios
论文作者
论文摘要
仅依赖大量单语言语料库的无监督的神经机器翻译(UNMT)在多个翻译任务中取得了显着的结果。但是,在现实世界中,对于某些极低的资源语言(例如爱沙尼亚语)而言,大量的单语语料库并不存在,而在没有足够的一种语言培训语料库的情况下,UNMT系统通常会表现不佳。在本文中,我们首先定义和分析UNMT的不平衡培训数据方案。基于这种情况,我们提出了UNMT自我训练机制来训练强大的UNMT系统并在这种情况下提高其性能。几种语言对的实验结果表明,所提出的方法大大优于常规UNMT系统。
Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.