在不平衡的培训数据方案中无监督神经机器翻译的自我训练

论文标题

在不平衡的培训数据方案中无监督神经机器翻译的自我训练

Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

论文作者

Sun, Haipeng, Wang, Rui, Chen, Kehai, Utiyama, Masao, Sumita, Eiichiro, Zhao, Tiejun

论文摘要

仅依赖大量单语言语料库的无监督的神经机器翻译（UNMT）在多个翻译任务中取得了显着的结果。但是，在现实世界中，对于某些极低的资源语言（例如爱沙尼亚语）而言，大量的单语语料库并不存在，而在没有足够的一种语言培训语料库的情况下，UNMT系统通常会表现不佳。在本文中，我们首先定义和分析UNMT的不平衡培训数据方案。基于这种情况，我们提出了UNMT自我训练机制来训练强大的UNMT系统并在这种情况下提高其性能。几种语言对的实验结果表明，所提出的方法大大优于常规UNMT系统。

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题