神经机器翻译的多尺度协作深层模型

论文标题

神经机器翻译的多尺度协作深层模型

Multiscale Collaborative Deep Models for Neural Machine Translation

论文作者

Wei, Xiangpeng, Yu, Heng, Hu, Yue, Zhang, Yue, Weng, Rongxiang, Luo, Weihua

论文摘要

最近的证据表明，具有更深神经网络的神经机器翻译（NMT）模型可能更有效，但很难训练。在本文中，我们提出了一个多尺度协作（MSC）框架，以简化比以前使用的模型更深的NMT模型的培训。我们通过将块尺度的协作机制引入深NMT模型，将梯度的背部从上到下明确提高。然后，我们没有强迫整个编码器堆栈直接学习所需的表示形式，而是让每个编码器块学习一个细粒的表示形式，并通过使用上下文规模的协作来编码空间依赖性来增强它。我们提供的经验证据表明，MSC网络易于优化，并且可以从深度大大提高的翻译质量方面提高。在带有三个翻译方向的IWSLT翻译任务上，我们的极深模型（具有72层编码器）超过强基础 +2.2〜 +3.1 bleu点。此外，我们的Deep MSC在WMT14英语 - 德国任务上达到了30.56的BLEU得分，这显着超过了最先进的Deep NMT模型。

Recent evidence reveals that Neural Machine Translation (NMT) models with deeper neural networks can be more effective but are difficult to train. In this paper, we present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously. We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models. Then, instead of forcing the whole encoder stack directly learns a desired representation, we let each encoder block learns a fine-grained representation and enhance it by encoding spatial dependencies using a context-scale collaboration. We provide empirical evidence showing that the MSC nets are easy to optimize and can obtain improvements of translation quality from considerably increased depth. On IWSLT translation tasks with three translation directions, our extremely deep models (with 72-layer encoders) surpass strong baselines by +2.2~+3.1 BLEU points. In addition, our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题