神经机器翻译的浅到深度培训

论文标题

神经机器翻译的浅到深度培训

Shallow-to-Deep Training for Neural Machine Translation

论文作者

Li, Bei, Wang, Ziyang, Liu, Hui, Jiang, Yufan, Du, Quan, Xiao, Tong, Wang, Huizhen, Zhu, Jingbo

论文摘要

事实证明，深层编码器可以有效地改善神经机器翻译（NMT）系统，但是训练非常深的编码器很耗时。此外，为什么深层模型帮助NMT是一个悬而未决的问题。在本文中，我们研究了一个良好的深度变压器系统的行为。我们发现，堆叠层有助于提高NMT模型和相邻层的表现能力类似。这激发了我们开发一种浅到深度训练方法，该方法通过堆叠浅模型来学习深层模型。这样，我们成功地使用了54层编码器训练变压器系统。 WMT'16英语 - 德国人和WMT'14英语翻译任务的实验结果表明，它比从头开始培训的$ 1.4 $ $ \ timper $ $ $，并且在两项任务上达到了30.33美元和43.29美元的BLEU得分。该代码可在https://github.com/libeineu/sdt-training/上公开获取。

Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models. In this way, we successfully train a Transformer system with a 54-layer encoder. Experimental results on WMT'16 English-German and WMT'14 English-French translation tasks show that it is $1.4$ $\times$ faster than training from scratch, and achieves a BLEU score of $30.33$ and $43.29$ on two tasks. The code is publicly available at https://github.com/libeineu/SDT-Training/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题