神经机器翻译的非常深的变压器

论文标题

神经机器翻译的非常深的变压器

Very Deep Transformers for Neural Machine Translation

论文作者

Liu, Xiaodong, Duh, Kevin, Liu, Liyuan, Gao, Jianfeng

论文摘要

我们探讨了非常深的变压器模型在神经机器翻译（NMT）中的应用。使用一种稳定训练的简单而有效的初始化技术，我们表明，建立具有多达60个编码器层和12个解码器层的标准型号是可行的。这些深模型的表现优于其基线6层对应物高达2.5 bleu，并在WMT14英语 - 联邦（43.8 bleu和46.4 bleu）上实现新的最新基准结果，带有反向交流）和WMT14英语 - 班级（30.1 bleu）（30.1 bleu）。 https://github.com/namisan/exdeep-nmt。

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

下载PDF全文

下载文献需遵守相关版权规定

论文标题