将Bert纳入神经机器翻译

论文标题

将Bert纳入神经机器翻译

Incorporating BERT into Neural Machine Translation

论文作者

Zhu, Jinhua, Xia, Yingce, Wu, Lijun, He, Di, Qin, Tao, Zhou, Wengang, Li, Houqiang, Liu, Tie-Yan

论文摘要

最近提出的伯特（Bert）在各种自然语言理解任务（例如文本分类，阅读理解等）上表现出了强大的力量。但是，如何有效地将BERT应用于神经机器翻译（NMT）缺乏足够的探索。尽管BERT更常用于微调，而不是上下文嵌入以了解下游语言理解任务，但在NMT中，我们对使用Bert作为上下文嵌入的初步探索比用于微调更好。这激发了我们思考如何更好地利用沿这个方向的NMT的BERT。我们提出了一种名为BERT融合的模型的新算法，在该算法中，我们首先使用BERT提取输入序列的表示形式，然后通过注意机制将表示与NMT模型的每个层和解码器融合。我们对受监督（包括句子级别和文档级翻译），半监督和无监督的机器翻译进行实验，并在七个基准数据集上实现最新结果。我们的代码可在\ url {https://github.com/bert-nmt/bert-nmt}中找到。

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks enough exploration. While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning. This motivates us to think how to better leverage BERT for NMT along this direction. We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets. Our code is available at \url{https://github.com/bert-nmt/bert-nmt}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题